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Abstract 

Given a set Fofn positive functions over a ground set X, we consider the problem of computing x* 
that minimizes the expression J^feF /(^')' x € X. A typical application is shape fitting, where we 
wish to approximate a set P of n elements (say, points) by a shape x from a (possibly infinite) family X 
of shapes. Here, each point p E P corresponds to a function / such that f{x) is the distance from p to 
a;, and we seek a shape x that minimizes the sum of distances from each point in P. In the ^-clustering 
variant, each a; G X is a tuple of k shapes, and f{x) is the distance from p to its closest shape in x. 

Our main result is a unified framework for constructing coresets and approximate clustering for such 
general sets of functions. To achieve our results, we forge a link between the classic and well defined 
notion of e-approximations from the theory of PAC Learning and VC dimension, to the relatively new 
(and not so consistent) paradigm of coresets, which are some kind of "compressed representation" of the 
input set F. Using traditional techniques, a coreset usually implies an LTAS (linear time approximation 
scheme) for the corresponding optimization problem, which can be computed in parallel, via one pass 
over the data, and using only polylogarithmic space (i.e, in the streaming model). 

For several function families F for which coresets are known not to exist, or the corresponding (ap- 
proximate) optimization problems are hard, our framework yields bicriteria approximations, or coresets 
that are large, but contained in a low-dimensional space. 

We demonstrate our unified framework by applying it on projective clustering problems. We obtain 
new coreset constructions and significantly smaller coresets, over the ones that appeared in the literature 
during the past years, for problems such as: 

• fc-Median [Har-Peled and Mazumdar,STOC'04], [Chen, SODA'06], [Langberg and Schulman, 
SODA 10]; 

• A: -Line median [Feldman, Fiat and Sharir, FOCS'06], [Deshpande and Varadarajan, STOC'07]; 

• Projective clustering [Deshpande et al., SODA'06] [Deshpande and Varadarajan, STOC'07]; 

• Linear £p regression [Clarkson, Woodruff, STOC'09 ]; 

• Low -rank approximation [Sarlos, FOCS'06]; 

• Subspace approximation [Shyamalkumar and Varadarajan, SODA' 07], [Feldman, Monemizadeh, 
Sohler and Woodruff, SODA 10], [Deshpande, Tulsiani, and Vishnoi, SODA 11]. 

The running times of the corresponding optimization problems are also significantly improved. We 
show how to generalize the results of our framework for squared distances (as in fc-mean), distances to 
the gth power, and deterministic constructions. 
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1 Introduction 



Over the last couple of decades, much effort has been put in understanding the combinatorial and computa- 
tional complexity of a wide range of clustering and shape fitting problems. Given a set of n data elements 
P, one of the powerful techniques used in this context is that of coresets, i.e., a small set D of representative 
data elements which approximately represent P, in terms of various objective measures. More precisely, for 
a set of candidate queries X, and a measure function cost(i-', x), the set D is an e-coreset for P if cost(L', x) 
approximates cost(P, x) for every x G X, up to a multiplicative factor of 1 it e. See e.g. [AHPV05] for a 
nice (but not updated) survey. 

Succinct coresets that lead to efficient algorithms appear in a variety of shape fitting and clustering 
problems. However, their proof of existence and efficient construction is usually tailor made to fit the 
properties of the problem at hand. Moreover, there are several natural clustering problems for which it is 
proven that no coresets of size o{n) exist. These include, for example, approximating points in M'^ by a pair 
of planes [HP04], the clustering of weighted points in by a set of 2 lines [HP06], and approximating a 
point set by /c-lines [HP06], where k > logn. These kind of clustering problems are usually referred to as 
projective clustering. 

1.1 This work 

Let F be a set of n functions from X to [0, oo). Thi^oughout this work, each function f ^ F will correspond 
to a data element, and x ^ X will correspond to a center (or a set of centers). For a center x ^ X, the value 
f{x) corresponds to the cost of evaluating / with the center x. The cost of evaluating F with x € X is 
defined as cost(F, x) = J2f(^F /(^)- 

Intuitively, the cost function should be interpreted in the context of shape fitting, where X represents 
a set of shapes, and f{x) represents the cost of fitting an element represented by / to the shape x. For a 
given query shape x G X, the value cost(F, x) represents how well x approximates F. In the context of 
fc-clustering, the "center" x represents a tuple of k centers, and f{x) represents the distance from an element 
/ to its closest center in x. For example, in the well known fc-median problem in W^, the corresponding set 
X is (M"')^. For a data element p G W^, and a center tuple x = (xi, . . . , Xfc) G (M'^)'^, the corresponding 
function fp is defined as /p(x) = mini dist(p, Xj). 

In this work, we present a unified framework for the efficient construction of coresets for clustering 
problems corresponding to a given function set F. Our coresets are obtained via a new and natural reduction 
to the well studied notion of e-approximation from the theory of VC dimension [VC71]. The reduction 
from coresets to e-approximations allows our framework to rely only on the combinatorial complexity of 
the input family F of functions (i.e., the combinatorial complexity of the clustering problem at hand), and 
to use the vast literature on e-approximation to obtain improved results (that are at times deterministic). For 
several function families F for which coresets are known not to exist, or the corresponding (approximate) 
optimization problems are hard, our framework yields bicriteria approximation, or coresets that are lai^ge, 
but contained in a low-dimensional space. 

In the body of the paper, we give an overview of the contributions of our work. We start by presenting, in 
Section 2, several concrete results that follow from our algorithmic paradigm, including a detailed compari- 
son with corresponding previous work. We then present the main proof techniques and conceptual novelties 
in our approach in Section 3. Finally, in Section 4, we present a detailed overview of our algorithms for the 
construction of corestes and bicriteria approximation. The above discussion will take up the body of this 
extended abstract. All of the technical details of our results appear in the (self contained) appendix. A first 
application of our framework (for HD-image processing) already appeared in [FFSl 1]. 
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2 Concrete Contributions 



2.1 Projective clustering 

Our concrete results are taking from the broad family of projective clustering problems. In the task of 
projective clustering we are given a set P C M*^ of n > d data elements, a positive integer k < n, and a 
non-negative integer j < d. A center x ^ X isa.k tuple {xi, . . . , Xk) where each x.i is a j -dimensional affine 
subspace (flat) in W^. The objective is to find a center x* that minimizes the cost(P, x) = J2peP dist(p, x) 
over X £ X. Here, dist(p, x) denotes the Euclidean distance from a point p to its nearest subspace Xi in 
X = {xi, . . . , Xk)- More generally, for a given z > 1, we wish to minimize the sum of distances to the power 
of z, i.e, X^pgp (dist(p, x)) . In this section we define three types of coresets for projective clustering: 

Strong coresets: A weighted set of points D in M'^ that approximate the distances to every possible /c-tuple 
of j-flats in W^, up to a multiplicative factor of (1 + e). 

Weak coresets: A weighted set of points D in M'^, such that a (1 + e)-approximation for the optimal 
solution of D yields a (1 + e) -approximation for the optimal solution of the full data set P. That is, 
any black box algorithm or heuristic that computes a (1 + e) -approximation for the coreset would yield a 
(1 + e)-approximation for the original set. Hence, a weak coreset can be viewed as a reduction from the 
clustering problem with input P to the same problem with input D. We note that in previous papers (e.g., 
[FMS07, FMSWIO]) the only way to get a PTAS for the original set is to run exhaustive search on the 
coreset. 

Streaming coresets: A weak coreset D that is updated online during one pass over the n points of P, 
while using only 0{d ■ jZ^D-space in memory. Streaming coresets can thus be used online to compute a 
(1 + e)-approximation for the optimal solution of the points in P viewed so far. 

All the algorithms that are described in this section are randomized, and succeed with probability at least 
1/2 (or any other constant approaching 1). 

Roughly speaking, the results given in this section are specific applications of our framework which, for 
general values of j, yields a bicriteria approximation B for the projective clustering problem followed by a 
so called B-coreset: D = proj(P, B) U S. Here, a bicriteria approximation is a set of possibly more than k 
centers, that approximates the cost of the optimal solution x* up to some constant factor approximation. The 
set proj(P, B) denotes the projection of the data set P onto the bicriteria centers B, and 5 is a set of t points. 
Our sets D have the qualitative properties of coresets. Namely, for t = 0{djk/e'^) the set D we obtain is a 
strong coreset, for t = 0(A;j^ log(l/e)/e^) we obtain weak coresets, and for t = 0(A;j^ log(l/e) log^ n/e^) 
streaming coresets. 

Our i?-coresets are constructed by the union of the two sets S and proj(P, B). While S is of small size 
t, the set proj(i-', B) may be large in size. Nevertheless, our coresets are of substantial interest as they imply 
a dimension reduction from the set P to the set proj(P, B). Indeed, when our centers are points (i.e., j = 0), 
we are able find a set B of size k, so proj(P, B) is also of size k. When our centers are lines (i.e., j = 1), 
the set proj(P, B) is contained in a small set of lines and we use [FFS06] to reduce the size of proj(P, B) 
to {e~^ logn)*^^'^'^. We discuss these cases and others (derived from our framework) in the subsections to 
come. 

The construction time of the strong and weak coresets is 0{ndjk + tlogn). All our coresets and 
running times below are generalized to sum of distances to the power of z > 1, after replacing the term e in 
the corresponding results by 1/ e^^ . 
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2.2 fc-Median and its generalizations 

We start by discussing the setting in which the centers X are fc-tuples of points in M.'^ (i.e., j = 0). 

Strong coresets: For the case j = and z = I, which is the standard fc-median problem, we present 
a strong coreset of size t = 0{dk/e^). This improves on previous resuUs in [HPK07, Che06, LSIO], 
where the construction of e-coresets of size 0{k^e~'^~^), 0{k'^de~^ log n), and 0{cPk^e~^), is respectively 
presented. The term 0{x) hide factors that are poly-logarithmic in x. 

For general metric spaces (e.g., dist(p, x) is defined as the distance between p and x in the given met- 
ric), the dimension d is to be replaced by logn, implying strong coresets of size t = 0{k\og{n) / e^). This 
improves on the result of Ke Chen [Che06], which gives a coreset of size 0(fc^ log (n)/e^) for this problem. 
Both our results and those of [LSIO] ai^e generalized to cost functions which use a power z of the distance, 
namely cost(P, x) = ^pgp(dist(p, x))^. 

Weak coresets. For the fc-median problem, our framework yields a weak coreset D of size 0{k \og{l/ e) / e^). 
By computing a (1 + e) -approximation to the A;-median of D, we are able to compute a set of k centers that 
gives a (1 + e) approximation to the optimal centers for P in time 0(ndk + 2^°'^^^/'^'*^)). Our results 
generalize to any integer z > 1 by replacing e with e^^ in the corresponding time and space term. 

For the case of z = 1,2 (median and mean problems), Ke-Chen [Che06] suggested an O(ndk) + 
poly((i, log n) ■ 2P°'>'(''/^) PTAS. For the /c-mean case {z = 2), Feldman, Monemizadeh and Sohler [FMS07] 
improved this result using a weak coreset of size 

0{k log^ k log(l/e)/e5), that yields a PTAS that takes time 0{ndk) + d ■ po\y{k/e) + 2^^^/^). 

Streaming coresets. Our framework yields streaming coresets of size t = 0(A; log(l/e) log'^(n)/e^) for 
/c-median and its generalizations for z > 1. This improves on the result of Ke Chen [Che06] which suggests 
a streaming coreset of size 0{dk'^e~'^ log^ n) for z = 1, 2. We note that Feldman, Monemizadeh and Sohler 
[FMS07] present a streaming coreset of size poly(A;logn/e) for the special case of A;-mean {z = 2). To the 
best of our knowledge, no streaming coresets of size independent of d were known for the case z > 2. 

2.3 A;-Line median and its generalizations 

In this case, we seek to cluster the points in P by A; lines in (i.e., we take j = 1). Very little is known 
about this problem in high dimensional space. 

Strong coresets. Combining our results with techniques presented in [FFS06], we obtain strong coresets 
for this problem of size (log(ri)/e)'^(^) + 0{dk/e'^). This improves on the previous work of [FFS06] that 
for z = 1, 2 introduces coresets of size log'^^''^ ^i^O{d\ogd+k) _ 

Weak coresets. The best PTAS (prior to our work) for this problem takes time dn -poly (/c/e)+n (log ny°^^^^/^^; 
see [DV07]. We suggest a weak coreset for this problem of size (log(n)/e)*^('^) which improves the running 
time of this result to 0{ndk) + (log n)P°'>'(''/^). 

Streaming coresets. We construct the first streaming coreset for this problem. Its size is (log(n) /e)^'^^\ 

2.4 Subspace approximation 

In the problem of subspace approximation one seeks a single j-flat that approximates the data set P (i.e., in 
our notation k = 1). 

Strong coresets. We suggest a strong coreset of size t = 0{dj/e^) for any j > 1. This is the first strong 
coreset of size polynomial in d for approximating the sum of distances to any j-dimensional subspace. 
In [FFS06] a strong coreset of size 

(l/e)poiy(i,<i) .logO(j') n is constructed in nd ■ j^^^^ time. 
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For the case z = 2 and j = d — 1 (sum of squai^ed distances to a hypeiplane) Baston, Speilman and 
Srivastava [BSS09] recently proved that there is a coreset of size 0{d/e'^) which is a also a weighted subset 
of P. Many applications of this construction were suggested in [Naoll]. Such a coreset can be constructed 
directly from Theorem 4.1 below in time 0{nd'^ + d/ e^), with high probability, while [BSS09] provide a de- 
terministic construction in 0{nd'^/e^) time. Unlike the above constructions, our results can be generalized 
for any z > 1 and j < d—1 where e is replaced by e^^ in the running time and coreset's size. Deterministic 
constructions of such coresets can can be computed in time n • using the de-randomization technique 

of [Mat95]. 

Weak coresets. We obtain a weak coreset of size 0{j^ log(l/e)/e'^) for the subspace approximation prob- 
lem that yields an 0{dnj) + 2P°'>'(-''^/^^)) time PTAS. A result of Shyamalkumar and Varadarajan [SV07] 
and subsequent work by Deshpande and Vai^adarajan [DV07] gave a (1 + e)-approximation algorithm for the 
case z > 1, with running time dnexp{j, For the case z = 1, the running time was recently improved 

to 0{dnpoly{j, l/e) + 0{d+n) exp(j, 1/e) by Feldman, Monemizadeh, Sohler and Woodruff [FMSWIO]. 

Streaming coresets. Our streaming coresets for subspace approximation ai^e of size t = 0{j^ log(l/e) log^ n/e^), 
and thus use 0{d ■ t) space. Sarlos [Sar06] provides a streaming algorithm that requires two passes over the 
data and uses space 0{n){k/e + k log k)"^. 

For the case of non constant j, Deshpande, Tulsiani, and Vishnoi recently showed that computing a 
PTAS for this problem is "hard" [ADll]. However, they suggested a constant factor approximation using 
a relaxation to convex programming, which takes time d ■ poly(n). Applying this algorithm on the output 
coresets of our framework would thus yield a constant factor approximation in 0{dn + d ■ poly(j)) time 
together with a strong, and streaming coreset. 

CUR Decomposition. Given j > 1 and an n x d matrix A, the CUR decomposition A = CUR consists 
of an n X m matrix C, m x j matrix U, and j x d matrix R, such that: (i) The columns of C are subset of 
columns from A, and the rows of i? are a subset of rows from A. (ii) A minimizes ^I2i=i W'^i — ^*ll2 over 
every A of rank j, up to a multiplicative factor of (1 + e). Here, and di are the ith row of A and A, 
respectively. 

For the case z = 2, Boutsidis et al. [BDMIll] provide (2 + e) randomized and deterministic CUR 
decompositions using m = 0{j/e) columns. They also provide an updated reference for this long line of 
research. Mahoney and Drineas suggested a randomized algorithm that yields a (1 + e) -approximation for 
the case z = 2 [MD09]. 

To the best of our knowledge, the CUR a decomposition is not discussed for z 7^ 2 or for the streaming 
model. Since all the approximated j-subspaces that are described in this paper are spanned by poly(j/e) 
input points, our coresets yields corresponding (1 + e)-approximation for the CUR decomposition in these 
cases using the observations from [MD09]. 

Linear regression. In the ii regression problem, the input is an n x (d—1) matrix A and a vector 6 S M". 
The the goal is to minimize \\Ay — b\\i over all y G R'^"^. By defining a set P of n points in Mf^ that 
correspond to the rows of the matrix [A\b], and mapping any vector y € W^~^ to the hyperplane x that is 
orthogonal to the vector [y^, —1]-^, it is easy to verify that a strong coreset for the subspace approximation 
of P with j = d — I would yield a strong coreset for the corresponding linear regression problem for A, b. 

In particular; our strong coresets for subspace approximation with j = d—l yield a strong coreset for the 
linear regression problem of size t = 0(d^/e^). The construction time is 0{nd'^ +d'^e~^ log 7i). Computing 
the £1 regression on the strong coreset would thus take 0{n(P + poly(d/e)) time (e.g., using [DDH+08]). 
Maintaining these strong coresets in the streaming model will yield a streaming algorithm that takes space 
t = 0{d^ log^ n/e^). As mentioned in the beginning of Section 2, the results ai^e generalized for any z >1 
where e is replaced by e^'^ in our running time and size of coresets. 
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Efficient approximation algorithms for the regression problem are given by Clarkson [Cla05] for z = 1, 
Drineas, Mahoney, and Muthukrishnan [DMM06] for z = 2, and Dasgupta et al. [DDH+08] for z > 1 
in time 0{nd^ log n + poly{d/e)). All these results are obtained by constructing weak coresets for the 
corresponding problem. Some small space streaming algorithms ai^e available in the turnstile model (where 
the points are constrained to be on an integer grid of size (1)) for regression where I < z < 2 
by [FMSWIO] and [CW09] for z = 2. However, we are not aware of previous strong or streaming coresets 
for the original (unconstrained) problem. 

2.5 Projective clustering 

We now discuss the broad setting in which both j and k may be arbitrary. When j > 2 and k is taken to be 
general, there are no strong coresets (of size o(n)) for these problems, even for j = = 2 and d = 3; this 
can be proven using a simple generalization of the results of [HP04]. Also, for k > log 77,, the optimization 
problem cannot be approximated in polynomial time, for any approximation factor, unless P=NP [MT83]. 
However, the problem does allow one of the following bicriteria approximations (where one allows some 
leeway in both the number or dimension of flats and the quality of the objective function). In what follows, 
an (a, /3) bicriteria solution is a set B of (3 flats such that clustering the points P via B can be done at a cost 
at most a times the optimal k clustering. We now present our results in this context. 

Bicriteria Approximations. Giving a set of points in W^, whose minimum enclosing ball is of radius r*, 
suppose we want to compute a set of 0(log n) balls of radius at most r* that covers P. There is a generic and 
simple greedy algorithm that compute such a set in 0{nd) time using the theory of VC-dimension [BG95]. 
This algorithm works for any family of shapes of small VC-dimension. In this paper we generalize this 
algorithm for the case of non-covering problems. In general, our bicriteria algorithm has many advantages 
over previous work (e.g., [Ind99, CS07]), both in the fact that it is widely applicable (for a general families of 
functions, not necessarily metric spaces), more efficient (in terms of the approximation factors and running 
time), and implies deterministic constructions. 

In the context of projective clustering, in [FFSS07], an (a, /3) -bicriteria approximation algorithm was 
suggested, which produces, with high probability, at most I3{k,j,n) = logn • (jTcloglogn)'^'^-'^ flats of 
dimension j, which exceed the optimal objective value for any k j -dimensional flats by a factor of a{j) = 
2^0) xhe running time is dn log n ■ {2ky°^^^^\ Our framework improves (the running time, a and /3) upon 
this result and yields several bicriteria approximations algorithms. For small values of j and k, we present a 
bicriteria algorithm that yields an a = 1 + e approximation. It returns j3 = k log 77, flats in time 0{dnjk) + d ■ 
poly(j, fc, l/e)+2P'''y('''^'^/^) log^ n. For large values of k, we suggest a (1+e, /3) -approximation that returns 
j3 = log 71 • A;P°'y(-''^/^) flats of dimension j, and the running time is 0{dnl3) + d ■ poly(j, k^l/e) ■ log^ n. 

Low-Dimensional i?-Coresets for large j. Deshpande and Varadarajan [DV07] describe an algorithm that 
returns a subspace V spanned by po\y{jk/e) points that is guaranteed, with probability at least 1/2, to 
contain k j-subspaces whose union is a (1 + e) -approximation to the optimum solution. Using the volume 
sampling technique their algorithm runs in dnj^k^{jk/eY time for any z >\. 

Note that this result does not have the reduction property of weak coresets as defined in the beginning 
of this section. That is, even if we have an algorithm that computes the optimal set x* of k j-subspaces for 
any given set of points, it is not clear how to use it with V in order to have a more efficient solution for the 
original problem. Similai^ly, it seems that this result can not be generalized for the streaming model when 
the subspace V needs to be computed for a stream of 77 points P using less than 0{nd) space. 

For these problems (where k,j > 1), we suggest strong, weak, and streaming coresets contained in 
low-dimensional subspaces, and therefore take sub-linear space. Our coresets, refeiTcd to as i?-coresets, 
were described in Section 2.1, and are used as the first step for the construction of all the coresets presented 
in this section (including when j = 1 or = 1). 
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3 Novelties in proof techniques 



As specified in Section 2, our unified framework yields a number of improved results in the context of 
approximate clustering and shape fitting. In what follows, we briefly touch on the major new ideas used in 
our algorithms allowing theses improved results. 

Reduction to e-approximation: The main reason that our framework is able to address a spectrum of 
clustering and approximation problems lies in our reduction from the inconsistent definition of coresets to 
the notion of e-approximation. Using this reduction we can: (i) use a common ground in our analysis, thus 
removing the specialized (and sometimes tedious) analysis of the required sampling sizes used in many 
of the related works mentioned in Section 2. (ii) use smaller sample sizes that improve on those obtained 
in previous works, due to recent results taken from the context of Machine Learning [LLSOO]. (iii) apply 
numerous results from the field of Computational Geometry, dated back to [HW86], regarding the study of 
VC-dimension and e-approximations. For example: deterministic constructions [Mat95], for convex shapes 
(which have unbounded VC-dimension) [CEG^95], and in the streaming model [BCEG07]. 

Our reduction includes multiple stages and uses the new notions of robust approximation and robust 
corests as intermediate points. We elaborate on our reduction to e-approximation (including our new no- 
tions) in the upcoming Section 4 which addresses a detailed overview of our framework. 

Functional representation of data elements and coresets: To study coresets over a wide range of objec- 
tives, we present an abstract framework in which the data points are considered as functions. Namely, for a 
center x, the value f{x) represents the cost of clustering the data element corresponding to / with x. This 
representation is not superficial, and is in a sense cmcial, as in our setting the coresets we construct are no 
longer "data elements" (as is common in the literature) but rather functions as well. Indeed, in some cases, 
our coresets will correspond to a subset of data elements, and thus their representation by functions will 
have no special meaning. However, in several cases the coreset consists of a small set of functions, that are 
closely related to the original data functions, however differ in certain behaviors. 

For example, several of our coresets use functions g corresponding to the data functions / such that 
g{x) = f{x) only if f{x) is smaller than a certain threshold; otherwise g{x) will be neglected and equal to 
zero. Another example includes the use of functions g that correspond fully to data elements /, but appear 
in the coreset as having negative weight. We extend and generalize results from [FMSWIO] that had such 
properties. However, unlike in [FMSWIO], a PTAS for the optimization problem can be computed from the 
coresets without using the original data. 

One may argue that this skewed succinct representation of the original data violates the traditional line 
of thought in which a coreset consists of a subset of "real" data elements, and thus in many cases we make 
an effort in finding such "standard" coresets. However, when considering the computational objective in 
the construction of coresets, namely a tool to allow the efficient approximation of clustering problems, our 
notion of coresets plays a role equivalent to that of standard coresets. The flexibility in allowing our coresets 
to deviate from standard conception is a key point in our ability to obtain improved results. 

Generalized range spaces: In the vast literature on clustering, the notion of coresets is defined in several 
ways. Two common definitions include strong and weak coresets, which roughly speaking, address the 
combinatorial and computational aspects of clustering respectively. Namely, strong coresets require a similar 
behavior when compared to the data set for every set of centers, while weak coresets require "just enough" 
so that the coreset can be used in the design of efficient algorithms for approximate clustering. 

In this work we unify the study of weak coresets that was used recently in [AHPV05, FMS07, FMSWIO] 
with older results related to e-approximation [CF90], called e-frames. As our work reduces the study of 
coresets to that of e-approximation in certain range spaces, this unification is captured by the development 
of a new notion: a generalized range space and a corresponding generalized dimension. 
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More specifically, in the standard study of range spaces, an ^-approximation captures the propertied 
of the original space with respect to any range in the space. This intuitively corresponds to the study of 
strong coresets. For the (more delicate) study of weak coresets, we enhance the standard definition of a 
range space, to obtain a generalized definition and theory. In our generalized view, an e-approximation 
captures the propertied of the original space with respect to a subset of predetermined ranges in the space 
(and not necessarily all of the ranges). Choosing the predefined subsets carefully, one may capture the 
essence of weak coresets. The study of generalized range spaces enables us to use the same algorithms in 
our constructions of coresets, whether weak or sti"ong, where the difference in the obtained results (in size 
and running time) is now easily traced back to the notion of the generalized dimension of the range space at 
hand. 



4 Framework overview 

We now review the concept of ^-approximations and e-coresets followed by a detailed overview of our 
general framework. 

4.1 Approximations and coresets 

For a multi-set F of non-negative functions on a set X, we say that S" C F is an e-approximation for F, if 
for every every x £ X and r > we have 



range(F, x, r) range(S', x, r) 



< e. 



\F\ \S\ 
where range{S , x , r) = {f £ S \ f{x) < r}. 

For a set F of non-negative functions on a set X, we say that D is an e-coreset for F, if for every x S X we 
have 

(1 - e)cost(F, x) < cost(L», x) < (1 + e)cost(F, x), 
where cost(F, x) = J^fepfi^) cost(D,x) = J2feD /(^)- ^^^^ paper we forge a link between 

e-approximations and e-coresets for general families of queries. As a warm-up, we present the following 
theorem which is a special case of our main theorem (Theorem 4.11). It relates to the notion of sensitivity 
that was introduced in [LSIO] for /c-median type problems. 

Theorem 4.1 Let F be a set of functions from X to [0, oo) and < e < 1/4. Let m : F ^ N \ {0} be a 
function on F such that 

f(x) 

m(f)>n- max — — r. (1) 

^ - x€X cost(F,x) 

For each f £ F, let gf : X ^ [0, oo) be defined as gf{x) = f{x)/m{f). Let Gj consists ofrnj 
copies of Qf, and let S be an {e ■ n/ j^p m{f))-approximation of the set G = U/eF^/- Then D = 
{gf ■ \G\/\S\ \ gf £ S} is an e-coreset for F. That is, for every x £ X, 

|cost(F, x) — cost{D,x)\ < ecost(F, x). 

For example, suppose that we are given a set P of n points in R"^, and we wish to compute a small set 
of functions D such that, for every x £ M'^, we will have that cost(i5, x) is a (1 + e) -approximation to the 
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Algorithm BlCRlTERlA(F, e, a, /3) 

1 i^l-Fi^F 

2 while > 10/e 

3 ^ A (3/4,e,a,/3)-median of Fi 

4 Gi^ The set of the [(1 - 5e) -31^^1/4] functions 

f ^ Fi with the smallest value f{Yi). 

5 -Fj+i Fi\Gi 

6 i ^ i + l 

1 ^ An (a, /3) bicriteria to Fj 
8 return UFj 



Fig. 1: The algorithm BICRITERIA. 



sum of Euclidean distances X^pgp \\P 



For evety p G P and x G X = W^, let fp{a 



and 



F = {/p I p € P}. Let X* denote the point that minimizes the sum of distances to P, and define 



n- fpjx*) 
cost(F, X*) 



+ 2. 



It is not hard to verify that (1) holds for this definition of m{fp) and YlfeF "^(/) = 0{n); see [LSIO]. By 
the PAC-leaming theory, a random sample 5" C G of size 0{d/e^) is an e-approximation of the set G that is 
defined in Theorem 4.1, with high probability; see [LLSOl]. By Theorem 4.1 we conclude that there exists 
a set D, \D\ = 0{d/e'^), such that |cost(F, x) — cost(D, x)\ < ecost(F, x) as desired. In the next sections 
we present tools that allow us to compute such a small coreset D efficiently, deal with high dimensional 
spaces (say, when d = n), and with /c-clustering problems (for example, when x = (xi,--- ,Xk) and 
fp{x) = mini \\p - Xi\\). 



4.2 Bicriteria approximation 

As common in several studies of geometrical clustering, our starting point is that of bicriteria approx- 
imation. Given the function family F, and a set of potential centers X, an {a, (3) bicriteria solution to the 
clustering problem {F, X) is a subset i3 of X of size (3 such that cost(F, B) < a miux^x cost(F, x). Here, 
for a set B, the temi cost(F, B) is equal to YlfeF fi^)^ where f{B) is a slight abuse of notation which 
represents the expression uiiUxeB f{x)- Efficient bicriteria approximation algorithms for constant values 
of a and /3 have been extensively studied over the last decade for a number of function families F. For 
example, in [HPM04, Che06, FFS06, FMS07, FFKN09, FMSWIO, LSIO] the starting point for the efficient 
construction of small e-coresets for A;-median is an efficient bicriteria algorithm for fc-median. Bicriteria ap- 
proximation was also used as a starting point for computing clustering in the setting of outliers and penalties; 
see [CKMNOl, Che08]. 

The first part of our framework yields a general paradigm for bicriteria approximations, that essen- 
tially reduces the task at hand to that of e-approximations from the theory of Machine/PAC Learning and 
VC dimension [VC71, HW86]. Roughly speaking our reduction includes three steps. In the first step, we 
determine the combinatorial complexity of the clustering problem at hand by defining a corresponding gen- 
eralized range space and studying its generalized VC-dimension (we elaborate on these notions shortly). 
We then show that an e-approximation to the corresponding range space, yields a relaxed notion of bicriteria 
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clustering we refer to as a robust median. Finally, we show how to use these robust medians in able to obtain 
a bicriteria solution. An outline of our framework follows. 



Generalized VC dimension: Given the clustering problem at hand (i.e., the function family F), one starts 
by defining a corresponding range space and by studying its combinatorial complexity (i.e., dimension). 

Definition 4.2 (e.g., [LLSOO]) Let F be a finite set of functions from a set X to [0, oo). The dimension 
dim(F) of F is the dimension of the range space (F, ranges(F)), where ranges(F) is the range space of 
F, that is defined as follows. For every x (z X and r > 0, let range(a;, r) = {/ € -F | f{x) < r}. Let the 
set ranges(F) be defined as {range(a;, r) \ x X,r > 0}. The dimension of{F, ranges) is the minimum 
d such that 

yS CF: \Sn ranges(F)| < 

To allow the unified study of both strong and weak coresets, we enhance the definition above to that of 
a generalized range space. In a generalized range space corresponding to F, for every subset 5 of functions 
one defines a con^esponding subset of important ranges ranges(S') C ranges(F). In our context of 
clustering, the set ranges(5') will be defined by a subset 'V{S) of centers x G X that are guaranteed to 
include a good center to be used in the clustering of S. More precisely: 

Definition 4.3 Let F be a finite set of functions from a set X to [0, oo). Let X be a function that maps every 
subset S Q F to a set of items X{S) C X. The pair {F, X) is called a generalized function space, if for 
any S S' it holds that X{S) C X{S'). The dimension of{F, X) is the smallest integer d, such that 



ySCF : 



{S n range | range € ranges (5)} 



< 151^ 



where ranges(S') = {range(a;,r) | x € X{S),r > 0}. 



For a generalized function space {F, X), we now seek small subsets S <Z F that are e-approximations 
to the range space {F, ranges (5)). Loosely speaking, such sets will approximate the function set F with 
respect to the centers in X{S) that are (by definition) of "importance" to the approximation of S. Combining 
this with a proof that centers that approximate S also approximate F, will yield the weak coresets we desire. 
Notice that in the above definition we have required the function X to be monotone. This allows us to obtain 
the following (immediate) connection between random sampling and e-approximation (e.g., via [LLSOl]). 

Theorem 4.4 Let {F, X) be a function space of dimension dfrom X to [0, oo). Let e, (5 > 0. Let S be a 
sample of \S\ = ^ (d + log ^) i.i.d functions from F, where c is a sufficiently large constant. Then, with 
probability at least 1 — 5, S is an e -approximation of the range space (F, ranges (5")). 



To illustrate our definitions, consider the standard problem of fc-median in R'^. Here, the range space cor- 
responding to F in Definition 6.4 has dimension 0{dk). Thus, using this range space in our work would 
imply weak coresets and algorithms with running time that depends in an undesired fashion on d. As all 
our algorithms at their core ai^e based on the notion of e-approximation, to avoid this dependence on d, it 
suffices to define a generalized function space of dimension that is independent of d. 

Indeed, using the results of [S V07] it can be shown that every subset S* of F has a low dimensional corre- 
sponding set of centers (set of fc-tuples) X{S) such that Tiiui^^x{s) cost(5', x) < (1+e) min^g(]jjd)fc cost(S', x) 
Specifically, X{S) will consist of all fc-tuples x in the subspaces spanned by e^^ log(e~^) points in 5. It is 
not hard to verify that the dimension of {F, X) is now 0{ke~^ log(e~^)), and thus independent of d. Which 
finally yields a succinct e-approximation S via Theorem 7.3 that approximates F on all centers in X{S). 
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From e-approximation to robust medians: In what follows we define the robust median problem, which is 
a relaxed version of bicriteria clustering which strongly resembles the problem of clustering with outliers. In 
a nutshell, a robust median for a set of data elements (functions) S, is a set of centers Y C X that cluster all 
but a small fraction of the elements in S very efficiently. In the below definition, the pai^ameter a represents 
to the quality of clustering, the parameter /3 refers to the size of Y, the parameter 7 refers to the amount of 
outliers, and e is a slackness parameter. 

Definition 4.5 Let F be a set of n Junctions from a set X to [0, 00). Let < e, 7 < 1, and a > 0. For every 
X £ X, let Fx denote the \jn^ functions f € F with the smallest value f{x). Let Y Q X, and let G be 
the set of the [(1 — £)jn'\ functions f G F with smallest value f(Y) = min^gy f{y). The set Y is called a 
(7, e, a, /3)-median of F, if \Y\ = j3 and 



Notice that a set of centers Y which are a (1, 0, a, /3)-median are (by definition) an {a, (3) bicriteria 
approximation. Thus, one is interested in finding good robust medians for F. We show that this is possible 
via e-approximations S to the function space (F, X). In the lemma below we use 13 = 1. We note that a 
similar lemma, for general /3, also holds, and appears in the appendix. 

Lemma 4.6 Let {F, X) be a function space of dimension d. Let 7 € (0, 1], e G (0, 1/10), 5 S (0, 1/10), 
a > 0. Let S be a random sample of s = {d + log |) , i.i.d functions from F, where c is a sufficiently 
large constant. Suppose that x S X{S) is a ((1 — 6)7, e, a, l)-median of S, and that \F\ > s. Then, with 
probability at least 1 — 6, x is a (7, 4e, a, l)-median of F. 

Once the connection between e-approximation and robust medians is established, one can find robust medi- 
ans for F via an exhaustive (or sometimes more efficient) algorithm that addresses the e-approximation S. 

From robust medians to bicriteria. We are now ready to present our algorithm for bicriteria approxima- 
tion. Before presenting our algorithm, we note that although an (a, /3)-bicriteria approximation is precisely 
a (1, 0, a, /3)-median, we cannot use Lemma 9.6 above to obtain a bicriteria solution (as in Lemma 9.6, 
e > and there is a slackness in the reduction w.r.t. 7). 

Our algorithm BlCRlTERlA(F, e, a, /?) for bicriteria approximation appeai^s in Figure 1. The algorithm 
receives the function family F and pai^ameters a, (3,e and outputs a subset of centers of size logarithmic 
(in |F|) that act as a bicriteria approximation to the median problem on F. The main recursive call for 
"(3/4, e, a, /3)-median" in BICRITERIA is to the computation of a (3/4, e, a, /3)-median for F which is es- 
sentially done via the connection to e-approximation specified above. Namely, to compute a (3/4, e, a, (3)- 
median for the function set Fj (defined in the algorithm), we take a random sample S of Fi, find a corre- 
sponding robust median for S, and return it as a robust median for Fi. Our main theorem in the context of 
bicriteria approximation follows. 

Theorem 4.7 Let F be a set of n functions from a set X to [0, 00), and let a, /3 > 0, e € [0, 1]. Let 
B be the set that is returned by the algorithm BlCRlTERlA(F, e/100, a, /3); see Fig. L Then B is a 
((1 + e)a, j3 log n) -approximation for F. That is, \B\ < P log2 n and Z^jgi? min^^g b f{x) < (1 + e)a ■ 
m.m.x&x cost(F, x). This takes time 




Bicriteria = 0{nt + log^ n ■ RobustMedian + ExhaustiveBicriteria) 



where: 
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• t is an upper bound on the time it takes to compute fiY) for a pair f G F and Y Q X such that 
\Y\ < p. 

• O(RobustMediaii) is the time it takes to compute a (3/4, e, a, /3)-median for a set F' C F. 

• O(ExahstiveBicriteria) is the time it takes to compute an (a, /3) bicriteria for a set F' F of size 
\F'\ = 0{l/e). 

The size and running time are specified in Theorem 4.7 in an abstract manner as a function of a, /3, 
e, RobustMedian, ExhaustiveBicriteria, and implicitly d - the generalized VC dimension of the function 
space {F,X). In Section 2, we presented some concrete examples in which the size and running time 
specified in Theorem 4.7 are computed for specific well studied clustering problems. More examples appear 
in the appendix of this work. As we show, our framework improves upon previously best known results. 

4.3 From bicriteria to coresets 

Once one has established an (a, /3) bicriteria approximation for the clustering problem at hand, we present 
a paradigm for obtaining coresets (both strong and weak as defined in Section 2). 

We start the description of our results in the special case that the function set F corresponds to the clas- 
sical fc-median problem in W^. We then turn to present our framework when the function set F corresponds 
to the problem of clustering points onto k lines in R'^ (i.e., projective clustering). Finally we present our 
framework in its most abstract form, addressing general function families F. The algorithms presented in 
the case study above (presented in Figures 2 and 3) are all derived from the general algorithm presented in 
Figure 4. 

The fc-median problem in R°': Let P be a set of data elements in M.'^. Let the centers X consist of all 
fc-tuples of R"'. (In this context, there is a function fp G F corresponding to each point p G P defined as 
fp{x) = dist(p, x).) Our coreset construction in this case is very simple in nature and consist of two major 
steps. In the first step, using a bicriteria approximation B, we assign a weight nip to each data element 
p G P. We then iteratively sample the point set P according to the distribution implied by the weights 
{nip}, to obtain a small sample S C P. Our algorithm A;-Median-Coreset is presented in Figure 2. 

This general algorithmic paradigm in itself is the basis of several coreset constructions that have been 
recently suggested, e.g., [Che06, FMSWIO, FMS07, LSIO]. However, the main novelty in our algorithm is 
in its second step, which essentially adds the bicriteria centers as additional elements in the coreset. Adding 
the bicriteria centers to the coreset, combined with a delicate weighting mechanism (that may assign negative 
weights), enables the proof of the following theorem. In what follows, we assume B is an {0{l),0{k)) 
bicriteria approximation. This can be obtained from previous works (e.g., [Che06]) or by the use of our 
framework in an enhanced version of Theorem 4.7 (details appear in the appendix). 

Theorem 4.8 Let P be a set of n points in R'^. Let k > 1 be an integer, < £,6 < 1/2, and t = 
■ (dk + log(l/5)), where c is a sufficiently large constant. Then, with probability at least 1 — 6, 
k-MEDlAN-CORESET{P,B,t,£) returns a weighted e-coreset D P of size t. The running time needed 
to compute D is 0{ndk + log^(l/5) log^ n + k"^ + t log n). 

Replacing R'^ by any metric space {M. , dist) we obtain an analogous theorem in which the dimension d of 
the corresponding function space (which effects the sample size t in the theorem) is now log(n). 

Theorem 4.9 Let {P, dist) be a metric space of n points. Let < £,6 < 1/2, and t = -p ■ (A;logn + 
log(l/(5)) , where c is a sufficiently large constant. Then, with probability at least 1—6, /c-Median-Coreset(P, B,t,£) 
returns a weighted £-coreset D P of size t. The running time needed to compute D is 0{nk + 
log^(l/(5) log^n + A;^ +tlogn). 
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Algorithm A;-Median-Coreset(P, B, t, e) 

1 for each 6 € -B 

2 A the set of points in P whose closest point in B is h. Ties are broken ai^bitrarily. 

3 for each b ^ B and p £ Pb 

P\dist{p,B) 



COSt{P,B) 



+ 1. 



Pick a non-uniform random sample 5 of t points from P, where the probability 
that a point in S equals p € P, is rup/ J2q£P 



m„. 



5 for each p £ S 

w{p) < 

6 for each b £ B 

1 w{b) i 

8 D ^SUB 

9 return {D,S,w) 



Eg rUq 



151 • m„ 



[l + We)\Pb\- Yl "^(P)- 



Fig. 2: The algorithm fc-MEDlAN-CORESET. 



The main idea governing the proofs of Theorems 4.8 and 4.9 lies in the fact the the random sample S 
of algorithm A;-Median-Coreset is an e-approximation to (a slightly modified version of) the function 
family F corresponding to fc-median clustering of P. To obtain our succinct setting for t, we perform a del- 
icate analysis which detemiines the weights {nip}, {w{p)} and {w{b)} specified in /c-Median-Coreset. 
In the case of /c-median clustering, our coresets consist of points in the data set P (as common in the study 
of coresets for approximate clustering). In the coresets to come, this will no longer be the case, and the 
functional representation of our data will be central. 

Clustering onto /c-lines: We now turn to address the more complicated case of clustering onto k lines. 
Namely, let P be a set of data elements in W^. Let the centers X consist of all A;-tuples x of lines in M'^. 
As in the /c-median problem, our starting point is a bicriteria approximation B. However, in this case, our 
algorithm will have three steps instated of two. The first two steps are similar in nature to those of algorithm 
/c-Median-Coreset, however instead of returning a standard coreset, they will yield a so-called i?-coreset 
(for Bicriteria) — to be discussed in detail shortly. Once a P-coreset is obtained, we take advantage of its 
structure to obtain a standard coreset. 

We start by discussing the first two steps outlined in algorithm Metric-B-Coreset of Figure 3. As 
before, our coreset D is the union of two groups of points in M*^: the subset S which is obtained by a (non- 
uniform) random sampling; and a second subset which is obtained via the bicriteria solution B. However, 
in this case, the second group cannot consist of the {a, {3) bicriteria B itself as it is no longer a succinct set 
of points — but rather a set of lines! Thus, to proceed we project the points P onto the bicriteria solution 
to obtain a new subset of points P' of size identical to |P|. Namely, for each point p G P we define a new 
point p' on the closest Une in P to p such that dist(p, B) = \\p — p'\\. 

Our P-coreset D is now in essence the union of the sample S and the set P' denoted by proj(P, B) 
and acts as a coreset to P. To be more precise, the coreset is a function family which is a weighted and 
"threshold" defined version of dist(p, x) for points p in 5 U P'. For a point p £ S and a center x £ X, 
the corresponding function in D is proportional to dist(p, x) when p' = proj(p, B) is close to x and zero 
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Algorithm Metric-B-Coreset(P, B, t, e) 
1 for each p ^ P 



rur. 



\P\dist{p,B) 
cost (P,B) 



+ 1. 



2 Pick a non-uniform random sample 5 of t points from P, where for every q £ S and p £ P, 
we have q = p with probability nip/ J2z£P 

3 For p G P, let p' = proj(p, B). 

4 for every p € 5 and set x of points, define 

w{p,x) = \ -^-151 ^^^H^''^^^ . 
1^0 otherwise. 

5 for every p ^ P and a set a; of points, define 

'o dist(p',x) < ^^i^^l^ 

1 otherwise . 



w{p , x) 



6 D ^ S U pioi{P, B) 
1 return (D, 5, w) 



Fig. 3: The algorithm Metric-B-Coreset. 

otherwise (via the weight function w{p, x)). In a complementary manner, for a point p' G P' and a center 
X G X, the corresponding function in D equals dist(p',x) when p' i?, far from x and zero otherwise (via 
the weight function w{p',x)). Roughly speaking, the combination of functions corresponding to S and P' 
in our coreset allows to prove the quality of D using a case analysis that depends on the query point x ^ X. 
Namely, for some centers x we will assign the cost of dist(p, x) to the function in D corresponding to p' 
and for others to the functions con^esponding to S. This freedom will allow us to prove that indeed the cost 
of clustering D is a good approximation to that of clustering P. 

However, as the reader may have noticed, the size of our coreset is larger than the set we started with, 
so where is the gain? The gain is in the structure of the coreset D compared to the data set P: it is 
(essentially) the union of a small set S with a set P' that lies in a low dimensional space. Specifically, P' 
can be partitioned to sets, each consisting of points on a single line (from B). Thus, if B is small (and 
using Theorem 4.7 it is logarithmic), we have conceptually reduced the problem of finding a coreset for P 
to that of finding a coreset for D, which can now be done via its specialized structure (e.g., via [FFS06]). 
The following theorem summarizes the quality of the resulting algorithm, which (a) first mns Metric-B- 
CORESET to obtain D corresponding to S and P', (b) then uses [FFS06] and a few additional ideas to find a 
small set of points S' that are a good approximation to P' (including a corresponding weight function), and 
(c) returns a succinct function set corresponding to S and S'. 

Theorem 4.10 Let P <Z R'^, k > 1, < e, 6 < 1/2, r = k + log(l/5) and t > ^ {dk + log \), for a 
sufficiendy large constant c. A set D ofO(t) + log n)^^^^ points and a weight function w : D x X ^ 

[0, oo) can be computed in 0{ndk + dt^) + log^ n time, such that, with probability at least 1 — 5, for 
every set xofk lines in W^, 



dist(p, x) — w{p, x)dist(p, x) 



< £ dist(p, x) 
peP 
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Algorithm B-Coreset(F, F' ,s, m, e) 

1 For each / G F, let t : X — > [0, oo) be defined as: 

2 Let T = {tf I / € F}. 

3 For each f ^ F lei gj : X [0, cxd) be defined as: 

4 Let G J consist of the m J copies of (7j. 

5 G^U/eF^/- 

6 5 An e-approximation of G. 

8 return D ^TUU. 



tf{x) 

9 fix) 



fix) f'{x)>Sfix) 
otherwise 







fix) > Sfix) 

otherwise 



Fig. 4: The algorithm B-CORESET. 



The general setting: We now address the general setting in which we are given a general function family 
F. As in the previous case, our algorithm first finds a S-coreset, and only then may try to utilize the 
nature of the S-coreset to obtain a standai^d coreset. Our algorithm B-CORESET for finding the i?-coreset 
is presented in Figure 4 and is phrased in an abstract manner that captures the previously defined coreset 
algorithms Metric-B-Coreset and A;-Median-Coreset. 

Roughly speaking, as before, our B-coreset will consist of two subsets of functions, the subset T which 
is defined by the "projection" of F onto a given bicriteria B; and the function set U which is a weighted 
random sample of the function set F. However, for a general function set F, there is no natural notion 
of projection. To address this difficulty, we define the projection of F onto a bicriteria solution B, as an 
additional function set F' given as input to B-CORESET. In our analysis, we will rely on certain properties 
of F' that intuitively correspond to the standard notion of projection that arises in various applications. 
Additional inputs to algorithm B-CORESET include a threshold function s/ : X — )• [0, oo) for every / G -F, 
and a weight function m : F — >■ N\{0}. These will play the role of the threshold and weight functions 
defined in the previous algorithm Metric-B-Coreset. 

We now turn to discuss the set U returned as output by B-CORESET. Notice, that there is no use 
of random sampling in algorithm B-CORESET. Instead, to constmct the set U we use the more general 
notion of e-approximation, again on a weighted and threshold defined variant of F. To be precise, we could 
have used the notion of e-approximation in the previously defined coreset algorithms as well, but instead 
represented them in terms of random sampling for ease of presentation. 

All in all, algorithm B-CORESET returns two sets, the function set T that corresponds to a threshold 
version of F' (which intuitively con^esponds to a projected version of F onto a given bicriteria solution), 
and the function set U which corresponds to a small sized e-approximation to (a threshold and weighted 
version) of the family F. Our main theorem in the this general setting is now: 

Theorem 4.11 Let F be a set of fiinctions from X to [0,00], and < e < 1/4. Let s : iF,X) — )■ 
[0,00), and ?n : F ^ N \ {0}. For every x e X, let Mix) = {f e F : fix) < s/(x)}. For each 
f G F let f be a corresponding function associated with f, and let F' = {f\f € F}. Then for D = 
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B-Coreset(F, F', s, m, e) it holds that 



Vx e X :|cost(F,x) - cost{D,x)\ < 

El f(x) — f'(x)\ + s max 



feM(x) nif 




fGF\M(x) 



Some remarks are in place. Primarily, our presentation of Theorem 4.11 is very general and involves 
several parameters and function sets. From this presentation, both the the size and quality of our coreset D 
is hard to decipher. The abstract nature of Theorem 4.11 allows us to apply it on several function families 
F. In Section 2 we have presented a number of concrete algorithmic applications. These applications are 
proven in detail in the appendix. 

Secondly, as discussed in Section 3, the output of algorithm B-CORESET is a new set of functions D 
that may not be a subset of F. Indeed, this is the case, however we stress that the set U is essentially a subset 
of F which differs only by our weights mj and threshold cut-off s f. Moreover, the function set F' and thus 
the set T will be a set of functions that are typically easy to compute from a bicriteria of [F, X). As we have 
shown, in certain cases, such as the /c-median problem discussed previously, we are able to slightly modify 
our algorithm so that it returns a set of points D C F as the desired coreset and not a function set that may 
have cut-off thresholds. 
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Appendix 



5 Road map 

The body of this extended abstract holds a detailed discussion of our results, without elaborating on the 
rigorous technical content. In this self contained appendix, we present the complete definitions and proofs 
of all our claims discussed in the body of this work. The appendix is organized as follows. 

• In Section 6, we review the notion of e approximation for range spaces and define and analyze the 
new notion of e-approximations for function families. 

• In Section 7, we define and analyze the notion of generalized range spaces and generalized dimension, 
including the connection between these notions and the classical notions of Section 6. 

• In Section 8, we show a connection between e-approximations and a new relaxed notion of coresets 
we refer to as robust coresets. 

• In Section 9, we further study the notion of robust coresets and link them with the notion of a robust 
median discussed in the body of the paper. This connection ties the notion of robust medians with that 
of e-approximations. 

• In Section 10 we define the notion of a centroid set to be used in the sections to come. 

• In Section 11 we tie the notion of robust coresets with that of bi-criteria approximation, a connection 
discussed in the body of this work. 

• In Section 12, we use the analysis of previous sections to obtain concrete results on the bicriteria 
approximation of several clustering problems, some of which were discusses in Section 2 in the body 
of the paper. 

• In Section 13 we use our bi-criteria approximation to obtain algorithms for B-coresets (specified in 
the body of this work). In Section 14 we study the special case in which our functions F correspond 
to points in a metric space, in Section 15 we focus on the A;-median problem in metric spaces, and 
finally in Section 16 we study the /c-median problem in M*^. Many of the concrete results stated in 
Section 2 are proven in detail in these sections. 

• In Section 17 we study the kAme. median problem, and prove the results stated in Section 2. 

• In Section 18, we show how to apply our framework in order to construct (low-dimensional) B- 
coresets and coresets for subspace approximation. We apologize to the reader, and note that we are 
currently still writing parts of this section, which will be uploaded to a future version on arXiv. 



6 £ -Approximations 

In this section we will discuss the basic definitions of e-approximation used throughout this work. 

Definition 6.1 (range space.) A range space is a pair {F, ranges) where F is a set, and ranges is a set 

of subsets of F. The dimension of the range space {F, ranges) is the smallest integer d, such that for every 
G F we have 



{G n range | range € ranges} 
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The dimension of a range space relates (but is not equivalent) to a tenn known as the VC -dimension of a 
range space. 



Definition 6.2 (e-approximation of a range space.) A set S of functions is an e-approximation of the range 
space {F, ranges), if for every range G ranges we have 



I range | \S Ci range | 



\F\ 151 



< e. 



Usually 5 C F, othewise S is called in the literature a weak e-approximation. 

The following well known theorem states that a random sampling from a set is also an e-approximation 
of F. See discussion in [HP09]. 

Theorem 6.3 ([LLSOO, VC71]) Let {F, ranges) be a range space of dimension d. Let e, 5 > 0. Let S be 

a sample of 



|5| = -^d + log-J (2) 

i.i.d items from F, where c is a sufficiently large constant. Then, with probability at least 1 — 5, S is an 
e-approximation of {F, ranges). 

Definition 6.4 (range space and dimension of F. [LLSOO]) Let F be a finite set of functions from a set 
X to [0, oo). The dimension dim(F) of F is the dimension of the range space (F, ranges(F)), where 
ranges(F) is the range space of F, that is defined as follows. For every x ^ X and r > 0, let range(F, x, r) 
{/ € F I f{x) < r}. Let ranges{F) = {range(F, x, r) \ x £ X,r > 0}. 

The following lemma follows directly from our definitions: 

Lemma 6.5 Let F be a set of functions from X to [0, oo), and let k > 1. For every f (z F define a corre- 
sponding function f : X^ — > [0, oo) such that f'{xi, • • • , x^.) = mini<j<^. f[xi), for every xi, ■ ■ ■ € X. 
Let F' = {/' I / € F} be the union of these functions. Then dim(F') < k ■ dim(F). 

Definition 6.6 (cost) Let F be a set of functions from X to [0, oo). Let x £ X. We define cost(F, x) = 
E/eF/(^)- 

We now define the notion of an e-approximation for a function set F and tie it to an e-approximation 
of the con^esponding range space. This notion plays a central part in our work. Roughly speaking, an e- 
approximation for a function set F is a subset S that approximates the average cost of ranges in the range 
space corresponding to F. To allow invariance by constant multiplication, the quality of the approximation 
defined below is necessarily related to the parameter r bounding the value of our functions in the range 
being considered. 

Definition 6.7 (e-approximation of F) Let F be a set of functions from X to [0, oo), and let e € (0, 1). An 

e-approximation of F is a set S F that satisfies 



Vx e X,r >0 : 



cost(range(x, r),x) cost (5 n range(x, r),x) 



< er, 



\F\ \S\ 
where range(x,r) = {f F \ f{x) < r}. 

We now show the connection between e-approximations for range spaces and for function families. 
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Theorem 6.8 Let F be a set of functions from X to [0, oo), and let e € (0, 1). Let S be an e-approximation 
of the range space of F. Then S is an e-approximation of F. 

Proof. Let x ^ X and r > 0. For every 6 > 0, let range(6) = range(x, b). Let range(r) = {/i, • • • , /„} 
denote the n functions in range(r), sorted by their f{x) value. Let = ai = 0, and m = n/\en~\. For 
every i, I < i < m, let a2i = a2i+i = /ifen] (^)- We define the partition {Fi, ■ ■ ■ , F2m+i} of range(r), 
where Fi = {/ € F | f{x) = 0} and, for 1 < i < m, 

F2i = {feF\ a2^^l < fix) < a2i} , (3) 

\ {f e F \ f{x) = a2i} a2i^a2i-i 

a2i = a2i-i- 

Note that cost(Fi, x) = 0. For every i,2<i< 2m + 1, and = Fj n S, we have 

cost(S'j, x) = ^ f{x) = ^ (/(x) - aj„i) + \Si\ai^i 
f<^s, fas, 

^~l (4) 

/es, j=i 

Let Tj = Fj+i U • • • U F2m+i for every 1 < j < 2m. Summing the last term of (4) over 2 < i < 2m + 1 
yields 

2m+l i-l 2m 2m+l 

Y\Si\{aj - aj-i) \Si\{aj - aj^i) 

1=2 j=l j=l i=j+l 

2m 2m+l 2 m 

j=i i=i+i i=i 

Hence, summing (4) over 2 < i < 2m + 1 yields 

2m+l 

cost(5 n range(r), x) = cost(5j,x) 



j=2 

2m+l 2m 



(5) 



i=2 j=l 



Similarly, 



2m+l 2m 

cost(range(r),x) = ^ ^ (/(x) - aj_i) + ^(oj - aj_i)|rj|. (6) 
i=2 /eFi i=i 
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By the triangle inequality, 



cost (range (r), x) cost(S' n range(r), x) 



\F\ 



\S\ 



< 



+ 



+ 



cost(range(r), x) {aj — aj-.i)\r 



2m 



i-1 ■ 



2m. 

E 



\rj\ \Snrj\ 



\F\ \S\ 



(7) 



(8) 



aj_i)\S Cirjl cost(5 n range(r), 



1^1 



\S\ 



We now bound each term in the right hand side of the last equation. Using (6), we have 



cost(range(r), x) {aj — aj_i)|r 



E 



\F\ 



2m+l 



E E 

i=2 fdFi 



/(x) - a. 



i-l 



\F\ 



{a2i - a2i-i) ■ -j-^ < I ^1 < ea2m, 



1=1 



\F\ - \F\ 



which bounds (7). Similarly, using (5), 

2m 



E 



{ttj — aj-i)\S n Tjl cost(5 n range(r), : 



|5| 



\S\ 



< 



2m+l 

EE 

m 
i=l 



fix) - a 



\S\ 



0-21-1 , 



\S: 



2i\ 



\s\ 



Since S is an e-approximation for {F, ranges(F)), we have 

|range(&)| |S'nrange( 



Put 1 < i < m, and 



By (3) and (12), we have 



V6 > : 



'J2i 



\F\ 



\S\ 



< e. 



[ 021-1 F2i = 



\S2i\ IS n range (62* ) I 15* n range(a2. 



i-i 



\S\ \S\ \S\ 

|range(62i)| |range(a2i_i)| 



< 



\F2i 

\F\ 



\F\ 

^ + 2e < 3e. 



\F\ 



+ 2e 



Combining the last inequality in (1 1) bounds (9), as 

2m 



E 



(aj — aj-i)\S n rj\ cost(5 PI range(r),; 



\s\ 



\s\ 



(9) 



(10) 



(11) 



(12) 



< '^{0-21 - a2i-i) • 3e = 3ea2m. (13) 

i=l 
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Using (12), expression (8) is bounded by 



2m 




Srirj 



m 




«2i-l) • 



i=i 



m 



< 



«2i-i) • e — ea2m- 



Combining (10), (13) and the last inequality bounds the left hand side of (7), as 



cost(range(r), x) cost(S' n range(r),x) 



< ea2m + ecLm + 3ea2m 




5ea2m < 5er. 



□ 



By plugging Theorem 6.3 in Theorem 6.8 we obtain the following corollary. 
Theorem 6.9 Let F be a set of functions from X to [0, oo), and let e € (0, 1). Let S be a sample of 



Ltd items from F, where c is a sufficiently large constant. Then, with probability at least 1 — 5, S is an 
e-approximation of F. 

7 e -Approximations for High and Infinite Dimensional Spaces 

Suppose that we have a range space of a high (maybe infinite) dimension d. In this section we show that 
for several natural families of high dimensional range spaces, a small e-approximation can be constructed 
that approximates (not all, but rather) a subset of the ranges in the range space. This weaker type of e- 
approximation suffices to solve certain optimization problems in high dimensional space. Towards this end, 
we will define the notion of a generalized range space, the notion of a corresponding function space, and the 
notion of e-approximation in this context. As before, these notions will play a major role in our analysis. 

Definition 7.1 (generalized range space.) Let F be a set. Let Ranges be a function that maps every 
subset S F to a set Ranges (5) of subsets of F. The pair {F, Ranges) is a generalized range space if 
for every two sets S, G such that S Q G F, we have Ranges(S') C Ranges(G). The dimension of a 
generalized range space {F, Ranges) is the smallest integer d, such that 



We now define the generalized dimension of a family of functions: 

Definition 7.2 (function space.) Let F be a finite set of functions from a set X to [0, oo). Let X be a 
function that maps every subset S F to a set of items X{S) Q X. The pair (F, X) is called a function 




(14) 



V5 C F : {5 n range | range G Ranges(S')} < \S 
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space, if the pair (F, Ranges) is a generalized range space, where Ranges is defined as follows. For 
every x ^ X and r > 0, let range(2;,r) = {/ G F | f{x) < r}. For every S C F, let Ranges(5) = 
{range(a;, r) | x € X{S),r > 0}. The dimension dim(F, X) of the function space {F, X) is the dimension 
of the generalized range space {F, Ranges). 

We note that it is not hard to verify that for X = X it holds that dim(F, X) = dim(F, X). For a subset 
S of F, let F\x{S) '• '^{'S) — ^ [0, oo) be the function set which is defined by restricting the functions F to 
inputs in X{S). The following theorem is an immediate consequence of the proof in [LLSOO] and can be 
seen as a corollary of Theorem 6.3. 

Theorem 7.3 (e-approximation for a function space) Let {F, X) be a function space of dimension d from 
X to [0, oo). Let e, 5 > 0. Let S be a sample of 



i.i.d functions from F, where c is a sufficiently large constant. Then, with probability at least 1 — 5, S is an 
e-approximation of the range space {F, Ranges(S')). 

The following is a simple corollary of Theorem 6.8 that connects between the notion of e-approximation 
for range spaces and e-approximation for function sets in the generalized setting. 

Corollary 7.4 Let (F, X) be affinction space of dimension d. Let S be an e-approximation of the range 
space {F,Tlanges{S)) for sofne e > 0. Then S is an e-approximation ofF\x{sy 

Using Corollary 7.4 with Theorem 7.3, we now conclude: 

Theorem 7.5 Let (F, X) be a function space of dimension d. Let < e, 5 < 1, and let S be a random 
sample of at least 



i.i.d functions from F, where c is a sufficiently large constant. Then, with probability at least 1 — 5, S is an 



8 From e-approximations to (7, £:)-coresets 

In this section we define and analyze the notion of (7, e)-coresets: a relaxed notion of coresets (that we refer 
to as robust coresets) that we will use in our study of robust medians discussed in the Inti"oduction. Roughly 
speaking, we show that e-approximators for F are also (7, e)-coresets. 

Definition 8.1 ((7, e)-coreset.) Let e G (0, 1/2), and 7 G (0, 1]. Let F and S be two sets of functions from 
a set X to [0, 00). For every x G X: 

• Let Fx denote the [7j-Fj] functions f £ F with the smallest value f{x) 

• Let Sx denote the |"(1 — e)7|S'j] ffinctions / G S with the smallest value f{x) 

• Let Gx ^ Fx denote the \{1 — 2e)^\F\ \ functions f £ F with the smallest value f{x) 




(15) 




e-approximation of F\x{S)- 
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The set S is (7, e)-goodfor F if 



Vx G X ■ (1 - e) • ^°'^*('^^'^) < cost(5'a;,x) ^ cost(F.^.,x) . ^^^^ _^ ^-j 

*S'_7- -/^'v 



(16) 



r/je set S is a (7, e)-coreset of F if for every 7' € [7, 1], and e' € [e, 1/2), we /lave that S is (7', e')-good 
forF. 

Our definition of robust coresets has the flavor of approximating with outliers. Namely, in our definition, 
we allow a portion of the functions in both F and S to be neglected when considering the quality of S. In 
what follows, we show that an e-approximation 5 to a function set F is also a robust coreset. 

Theorem 8.2 Let e G (0,1). Let F be a set of functions from X to [0,oo), and let S be an (e/7)- 
approximation of the range space corresponding to F. Suppose that \F\,\S\ > 5/e. Let 7 G (0,1], 
and for every x G X: 

• Let Fx denote the [7 • \FW functions f F with the smallest value f{x) 

• Let Sx denote the [7 • \SW functions f G S with the smallest value f{x) 



Then 



Vx G X : 



cost(Fx-,3;) cosi{Sx-,x) 



\F\ 



\S\ 



< e ■ max f(x) . 



Proof. Let e G (0, 1/7), and let 5 be an e-approximation to the range space corresponding to F. By 
Theorem 6.8, S is also an e approximation to F. Let Sx denote the [7 • functions / G S" with the 
smallest value f{x). Let 7, 5^;, and Fx be defined as in the statement of the theorem. We will prove that 



Vx G X : 



cost{Fx,x) cost{Sx,x) 



F 



\S\ 



< 7e ■ max f(x) . 



(17) 



This suffices to prove the theorem for e G (0, 1). 

Indeed, for every x G X and r > 0, we define range(x, r) = {/ G F | /(x) < r}. By our definitions, 



Vx G X, r > : 



cost(range(x, r),x) cost(S' n range(x, r), x) 



and 



Vx G X, r > : 



1^1 \s\ 

|range(x,r)| |5 fl range(x, r) 



< er 



< e. 



\F\ \S\ 
Fix X G X, and let r = maxjg^^ug^ f{x), Y = {f G F \ /(x) < r}. We have 

cost(Fa,, x) = cost(Fa, H Y", x) + cost(Fa, \ Y,x) = cost(F^ n y, x) + r • \Fx\ - r ■ n y|. 

Similarly, 

cost(S'a;, x) = cost(S'2: Pi x) + cost(5'2; \ Y, x) = cost(5'2; n y, x) + r • \Sx\ — r ■ \Sx H Y\. 
Let ci = 5. Since \S\, \F\ > ci/e, we have that 



(18) 



(19) 



\Sx 



\s\ 



< max 



1 1 

\F\'W\ 



< 



ci 



(20) 
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Using the last equations, we have 

cost{Fx,x) cost{Sx,x 



\F\ \S\ 

cost(F^ n y, x) + r ■ |F^| - r • n y| cost(5^ nY,x) + r ■ \Sx\ - r ■ n Y\ 



\S\ 



< 



< 



cost(F^ n y, x) - r • n Y\ cost(S'x. nY,x) -r -is^n Y\ 



\F\ 

cost (F^nY^x) cost (5^. nY,x) 



\S\ 



+ r- 



\S\ 

|F,nF| |5,ny| 



+ 



\F,\ r-\S, 



\F\ 



\S\ 



(21) 



\F\ 



\S\ 



+ 



er 



ci 



We now bound each of the temis in the right hand side of (21). Using the triangle inequality, 

cost(y n Fx,x) cost(y n s^., x) 



< 



\F\ \S\ 

cost(y n Fj^, x) cost(y, x) 



\F\ \F\ 

cost(y, x) cost(y n f^., x) 



+ 



cost(y, x) cost(y n 5, x) 



r_]Y\F^ 
- \F\ 
Similarly, 



\F\ 



+ 



\F\ \S\ 

cost(y, x) cost(y n s, x) 



+ 



cost(ynS', x) cost (y n 5*2;, x) 



\s\ 



\F\ 



cost(y, x) cost(y n s, x) 



\F\ 



\S\ 



+ 



\S\ 

r-\YnS\S, 



+ 



COSt(yn5', x) COSt(y n 5a:, x) 



\s\ 



\s\ 



\YnFx\ \YnSx\ 



\s\ 



\F\ 



\F\ 



\s\ 

\Y\ \Yns\ 



\F\ \S\ 



+ 



\Yns\ \YnSx\ 
\s\ 



\s\ 



^lY^F^^ 
- \F\ 



\Y\ \Yns\ 



\F\ \S\ 



+ 



|yn5\5,| 



Combining the last two equations in (21) yields 



cost (Fa;, x) cost (5a., x) 



\F\ 



\S\ 



< 



cost(y, x) cost(y n s, x) 



\F\ 



\S\ 



+ r 



\Y\ \Yns\ 
\F\ 



\S\ 



\Y\FJ |yn5\5a:| er 

|F| |5| ci 



(22) 



By (18) we bound the first term in the right hand side of (22) by er. Using (19) we bound the second term 
by e. We thus obtain 



cost(y, x) cost(y n s, x) 



\F\ 



\S\ 



_l_ J. 



\Y\ \Yns\ 



\F\ \S\ 



< 2er 



(23) 



We now bound the other terms in the right hand side of (22). By the definition of r and Y, we have 
either y c F^, or y n 5 C 5a: (or both). Hence, \Y\ < \Fx\ or \Y n S\ < \Sx\. By (19) we have 



\Y\ \Yns\ 



\F\ \S\ 

Using the last three equations and (20), we obtain 



< e 



(24) 



\F\ \F\ 



< max < 0, 



|y n 5| |F, 



|5| 



\F\ 



+ e> < max < 



1 5a; I _ |Fa:| 

WW 



+ e} <— + e. 
ci 
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Since both and Y contain the functions with the smallest values /(x), we have |F^.ny| = min 
Together with the previous equation, we obtain 

' \ ^1 - I I I ^ I < max <^ 0, ' '.^ ' ^ < — + e . (25) 



\F\ \F\ - y \F\ J - ci 

Similarly, we bound the rightmost term in (22). As stated above, we have \Y\ < or |y n 5| < \Sx\- 
Using (24) with the last two inequations yields 

\YnS\-\S,\^ (^\Y\^ |5,n Jn l^-l^ |5,n £ 

< max < 0, J— + e — r-— > < max < 0, — — + e — > < he , 



\S\ - i ' |F| 1^1 J " I ' 1^1 \S\ J ~ ci 

where the last derivation follows from (20). We have \Y f] S Ci Sx\ = min {|y n S"!, ISajj]}. Together with 
the previous equation, we obtain 

' \ ^1 _ I I I < max<^0, 1 ' ' ' ^ < — + £ ■ 



\S\ \S\ \S\ - i ' \S\ j - ci 

Combining (23), (25) and the last equation in (22) proves (17) as follows. 



cost{Fx,x) cost{Sx,x) 



\F\ \S\ 



<2er + 2r{e+—]+2r{e + —]+ — 



6er + < 7er = 7e ■ max f(x) . 



□ 



We are now ready to state the connection between e-approximations and (7, e) coresets. 

Theorem 8.3 Let e € (0, 1/4), and 7 € (0, 1]. Let F be a set of functions from a set X to [0, cxd), and let S 
be an {e^^ / 63) -approximation of the range space corresponding to F (and thus also of the function set F), 
such that \S\, \F\ > 5/(e^7). Then S is a (7, e)-coreset of F. 

Proof. Let e £ (0, 1/12) and let S be an (e27/7)-approximation of F such that |5| > 5/{e'^^). We 
will prove that S is (7,3e)-good for F; see Definition 8.1. By our definitions, S is also an (e'^7'/7)- 
approximation of F, for every 7' > 7 and e' > e. Hence, S is (7', 3e')-good for every 7' > 7 and e' > e. 
This suffices to prove that 5 is a (7, e)-coreset by replacing e with e/3. 

Indeed, let be the [(1 — 6e)7|F|] functions f G F with the smallest value /(x), and Sx denote the 
[(1 — 3e)7|S'|] functions / € S with the smallest value /(x). In order to prove that S is (7, 3e)-good for 
F, we need to prove that 

, cost(Gx,x) cost(Sx,x) cost(FT.,x) , 
Vx G X : (1 - 3e) ' ' < r^4^ < , ^ • (1 + 3e) . (26) 

\^x\ l-^xl 

Fix X £ X, and let Hx denote the [7(1 — 3e)|F|] functions f £ F with the smallest value /(x). We 
first bound the right hand side of (26). By Theorem 8.2, we have 

cost(Sx,x) cost(Hx,x) 9 „, ^ 

Since 1 < e7|-F|, we have 

\Hx\ < (1 - 3e)7|F| + 1 < (1 - 2eh\F\ < (1 - 2e)\Fx\ . 
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By the last equation and Markov's inequality, 



1 cost(F^,x) 

max fix) < — • — . 

f&H,-'^ ' - 2e \F^\ 

Let [/ = {/€ F j f{x) < maxjgs^, /(x)}. Since S DU C Sx,we have 

\SnU\< (1 -3e)7|5|. 
Since S is an (e^ 7/7) -approximation of (F, ranges(F)), we have 

\u\ isnu 



(28) 



(29) 



\F\ 



\S\ 



< 



By (29) and the last equation, we obtain 



\u\ \snu\ £^7 ^ 



< (1 - 2e)7 < (1 - 2e) 



Hence, 



f ^Fx\ fix) > maxf{x) 

f&Sx 



\FJ - IK nU\> \FJ - \U\ > 2e\F^ 



Using the last equation with Markov's inequality, we conclude that maxj^Sx fi^) < cost(Fj, x) / (2e|F2; |). 
By this and (28), we obtain 

COSt(Fr,x) 

max fix) < — — — . (30) 



2e|F, 



Since this theorem assumes e7|F| > 1, we have 



(1 - 2e)7 



< 



(1 - 2e)7 



< 



(1 - 26)7 



\F\ (1 - 2e)7|F| - (1 - 3e)7|F| + 1 

Combining the last equation and (30) in (27) yields 

cost(S'a;,x) cost(F^,x) 2 ./ N 

rwi < r^Tl ^ ^ 7 max /(x) 

\S\ \F\ feHxUSx 

cost(Hx,x) cost(Fr,x) 
< (1 - 2e)7 • + £7 • ^ / ' 



(31) 



Since Hx contains the | Hx \ functions f € Fx with the smallest value / (x) , we have that cost [Hx , x) / j Hx \ < 
cost(F3;,x)/|Fr|. Using this in (31) yields 

cost(S'i.,x) cost(Fi.,x) cost(Fr,x) cost(F^,x) 
■ < (1 - 2e)7 — + £7 ^^^^ — < (1 - e)7 • 



1^1 \Fx\ 2\Fx\ ^" \Fx 

Multiplying the last equation by ISI/jS"^! bounds the right hand side of (26) as follows 

cost(5j:,x) ^ (1 - £)7 cost(Fi.,x) ^ ^^^^ cost(Fj;,x) 



(1 - 3e)7 \Fx 



\Fx 



(32) 
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We now bound the left hand side of (26) in a similai^ way. Let Tx denote the [7(1 — 6e)|5|] functions 
f € S with the smallest value f{x). Since 1 < e7|5|, we have 

|r,| < (l-6e)7|5j + l < {l-5eh\S\ 
< (l-2e)(l-3e)7|5| < (l-2e)|5,|. 

By the last equation and Markov's inequality, 

cost(5^,x) 

Let y = {/ G F I f{x) < max/gG. fix)}- Since Y C G^, we have \Y\ < (1 - 6e)7|F|. Since S is 
an (e^7/7)-approximation of F, substituting r = maxjgy f{x) in Definition 6.2 yields 

l-^^^l <M + £^<(i_2e)(l-3e)7<(l-2e) l*^"' 



\S\ -\F\ 7 ' ' \S\ 

That is, [STiy] < (1 - 2e)\Sx\. Hence, 



f e Sx\ f{x) > max f{x] 
f&Gx 



\Sx\ - \Sxf^Y\ > \Sx\ -\SnY\> 2e\Sx 



Using the last equation with Markov's inequality, we conclude that maxj-gg^ f{x) < cost{Sx,x) / (2e|S'a; |). 
By this and (33), we obtain 

cost{Sx,x) 

max J(x) < 



S&G^vjT-x 2e\Sx 



Since £715*1 > 1, we have 



1 _ (1 - 56)7 ^ (1 - 5£)7 ^ (1 - 56)7 
\S\ (l-5e)7i5| - (l-6e)7|5| + l - |r,| " 

By Theorem 8.2, we have 

COSt(G^,x) COSt{Tx,x) 2 rr \ 

< TT^i he 7 max fix). 

\F\ - \S\ ' feG.uTj^ ^ 

Combining the last three equations yields 

cost{Gx,x) cost(Tx,x) 2 N 
rwi < r?] h £ 7 • max f{x) 

\F\ \S\ /GG^UT, 

cost(ra;,3;) cost(5j:,x) 

< (1 - 5e)7 p— + £7 -— — 

\^x\ ^\>Jx I 

cost(5x,x) costfS',., x) , , , cost(Sx,x) 

< (1 - 5e)7 • + ej ■ < (1 - 4^)7 • 

Multiplying the last equation by (1 — 3e)|F|/|Ga. | yields 

(1 _ . cost(G^,x) ^ (1 - 3e)(l - 4e)7|F| cost(5^, x) 



I I I Gx I I I 

^ (1 — 3e)(l — 4e) cost(S'a;, x) ^ cost{Sx,x) 



1 - 6e 

The last equation and (32) proves (26) as desired. □ 
Using Theorems 6.3 and 8.3, we get the following corollary. 
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Corollary 8.4 Let e € (0, 1/4), and 7 G (0, 1]. Let F be a set of functions from a set X to [0, 00). Let S be 
a sample of at least 



i.i.d functions from F, where c is a sufficiently large constant. Suppose \F\ > \S\. Then, with probability at 
least 1 — 5, S is a (7, e)-coreset of F. 

9 Robust medians: From (7, £)-coresets to (7, e, a, /?)-medians 

In this section we discuss the notion of robust medians stated in the Introduction and tie it to the notion 
of (7, e)-coresets discussed in the last section. Roughly speaking, a robust median is a subset of points Y 
from X that acts as a bi-criteria clustering of F when considering outliers. More specifically, our robust 
medians will be parametrized by four parameters: 7, e, a and /3. The parameter 7 (or to be precise 1 — 7) 
will specify the fraction of outliers considered. The parameter e is a slackness parameter crucial to the proof 
of our theorems to come. The parameter a is the approximation ratio between the obtained clustering by Y 
and the optimal 1-median clustering. Finally, the parameter /? will denote the size of Y. In several cases, we 
will just take /3 to be 1, and will remove the parameter (3 from our notation. 

Definition 9.1 (cost to a set of items) For a set Y C X, we denote Cost(F, Y) = J2feF ™™i/Gy fiu)- 

Definition 9.2 (robust median) Let F be a set of n functions from a set X to [0, 00). Let < e, 7 < 1, and 

a > 0. For every x £ X, let Fx denote the [712] functions f G F with the smallest value f{x). Let y C X, 
and let G be the set of the \{1 — £)jn'\ functions f (z F with smallest value f{Y) = minj^gy f{y). The set 
Y is called a (7, e, a, /3)-median of F, if\Y\ = f3 and 



For simplicity of notation, a (7, e, a)-median is a shorthand for a (7, e, a, l)-median. 

Let F be a set of functions from X to [0, 00). In the previous section we proved that a small (7, e)-coreset 
of F can be constructed using algorithms that compute e-approximation of F. In particular, a random 
sample 5 of F is such a (7, e)-coreset. In this section we prove that the (7, e, a)-median of S is also an 
(0(7), 0(e), a)-median of F. In other words, if we have a (possibly inefficient) algorithm for computing 
the (7, e)-median of a small coreset S, then we can compute a similar- median for the original set F in time 
linear in 7i. 

Theorem 9.3 Let F be a set of functions from a set X to [0, 00). Let e € (0, 1/10), 7 G (0, 1]. Suppose 
that S is a (7, £)-coreset of F, and that \F\ '>\S\'> 2/(^7). Let a > 0. Then a ((1 — 6)7, e, a)-median of 
S is also a (7, 4e, a) -median of F. 

Proof. For every x G X, let F^ denote the [7I-FI] functions f G F with the smallest value /(x). 

• Let X* £ X and F* C F, such that = |"7|F|] and cost(F*, x*) = miuxex cost(Fj, x). 

• Let x' be a ((1 — 5)7, e, a)-median for 5 

• Let G denote the |^(1 — 4e)7|F|] functions f £ F with the smallest value f{x') 

• Let S' denote the [(1 — 3e)7|5|] functions / € 5 with the smallest value /(x') 




Cost(G, y) < amincost(Fj;,x) . 
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• Let S* denote the [(1 — e)7|S'|] functions f e S with the smallest value f{x*) 
We have 

\S'\ = \{l - 3e)j\S\] < \{l -e){l- eh\S\\ . (34) 

Since 1 < 1, we have = [(1 - e)'y\S\'] > \{1 - e)j\S\]. Using this, (34) and the fact that x' is a 
((1 — e)7, e, a) -median of S, we have 

cost(5',2;') < a ■ cost(S*,x*) . (35) 

Since is a (7, 4e)-coreset of F, it is (7, e)-good for F; see Definition 8.1. By this, and since \G\ < 
\{1 - 2e)7|F|] , and \S'\ > (1 - e)7|5|, we obtain 

^ cost(G, x') ^ cost(5",x') 



\G\ - \S'\ 
Since S is a (7, e)-coreset of F, we have that 

cost(S'*,x*) ^ cost(F*,x*) 



|5*| - ^ ' ^ \F 
By (35) and the last two equations, we obtain 

|G|cost(S',x') 



cost(G,a;') < 



{l-e)\S'\ 



- {l^-e)\S'\ •^°^^(^*'^*) (36) 
, \G\a \S*\{l + e) 

- {l-e)\S'\ \F*\ "^"''^"^ '"^ ^ ■ 

By the assumption of the theorem, we have l^j > 2/(67), so 1 < e7|5|/2. Hence, 

\S*\<{l-eh\S\ + l<{l-e/2h\S\. 

Similarly, since 1 < 4e7|F|/2, 

|G| < (1 - 4e)7|F| + !<(!- 4e/2)7|F|. 

Therefore, 

\G\a \S*\{l + e) ^ (l-4e)7|F|a (1 - e)7|5| • (1 + e) 



{l-e)\S'\ \F*\ - (i-e){l-3eh\S\ ^\F\ 

(1 — 4e)a 

= (l-e)(l-3e) -'^-^'"+^'^°- 

Using the last equation with (36) yields 

cost(G, x') < a ■ cost(F*, X*) . 
Hence, x' is a (7, 4e, a)-median of F as desired. □ 
In the following (immediate) corollary, we use the same parameters as in Theorem 9.3. 
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Corollary 9.4 Let Y C X be a set of size 13 that contains a ((1 — e)^,e, a) -median of S. Then Y is a 
(7, 4e, a, /3) -median of F. 

Suppose that for a small subset S from F, we can compute a (7, e, a, /3)-median y for /3 > 1. For 
/3 = 1, we showed in Lemma 9.3 that if 5 is a robust coreset for F then y is a robust median for F. 
Unfortunately, this does not hold for /3 > 1. However, if we use stronger assumptions on the set S, the 
following theorem proves that Y is indeed a robust median in this case. More specifically, we will need 5 to 
be an approximation to an enhanced version of the function set F. The enhanced function set corresponding 
to F is one which takes as input subsets Y C X (and naturally outputs the minimum evaluation over points 
in Y). In a later section, will will use the theorem below to construct efficient bicriteria approximation 
algorithms from inefficient ones. 

Theorem 9.5 Let (3 > 1 be an integer, 0<e<l/10, 0<7<1, and a > 0. 

• Let F be a set of functions from X to [0, 00) such that \F\ > l/(e^7). 

• For every f G F define hf : X U X^ [0, 00) as h{Y) = min.ygy f{y). 

• Let S be a (7, e)-coreset for H = {hj \ f G F}, such that \S\ > l/(e^7). 

• Let Y be a ((1 — 6)7, e, a, (3) -median for S^x- 
Then Y is a (7, 4e, a{l + \Qe) , (3)-median for F\x- 

Proof. Let G C denote the [(1 - 4e)7|F|] functions hj e H with the smallest value hf{Y) = 
miiiygy f{y)- Let Sy denote the [(1 — 2e)7|5|] functions f £ S with the smallest value f{Y). Since 5 is 
a (7, e)-coreset for H, it is also (7, 2e)-good for H; see Definition 8.1. Hence, 

,1 _ 2.) . 2251^ < ,37) 

|G| |6y| 

For every x £ X, let Sx denote the [(1 — e)7|S'|] functions f £ S with the smallest value f{x). Let z be 
the item that minimizes cost{Sz, z) over z £ X. The theorem assumes |5| > l/(e^7). Therefore 

\Sy\ < (l-2e)7|5| + l 

= (1 - efj\S\ + 1 - eS\S\ < (1 - efj\S\. 

By this and the definition of Y, 

cost(S'y,y) < acost(5^,z). (38) 

For every x £ X, let Fx denote the [71-^1] functions f £ F with the smallest value /(x). Let x* be a 
center that minimizes cost{Fx,x) over x £ X. By definition of z, 

cost{Sz, z) < cost{Sx* ,x*). (39) 

Since is a (7, e) -coreset for H, we have 

cost(5'^»,x*) ^^^^^y cost{Fx*,x*) ^^^^ 

\^X* I |-^^^* I 
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Combining (37), (38), (39) and (40) yields 



^.fr-i / I'^l •cost(S'y,y) \G\acost{Sz,z 
cost(G, Y) < — — < 



l-2e)-\SY\ - (l-2e)|5y 
G\acost{Sx* 
{l-2e)\S, 
5^. I |G| (1 + e)a • cost(Fa;.,x*) 



\G\acost{Sx*,x*) 
- il-2e)\Sy\ ^ ^ 



< 



Since e^7|5'j > 1, we have 
Since e^7|F| > 1, we have 



\Sy\ \F,*\ 1-2£ 
\Sx*\<{l-e)j\S\ + l<j\S\. (42) 



|G| < (l-4e)7|F[ + l <7[F[. (43) 
By plugging (43) and (42) in (41), we infer that 

cost(G,y) < l-2£ - (l + 10e)"-cost(F,*,x*), 

where in the last derivation we used the assumption e < 1/10 of the theorem. This proves that y is a 

(7,e, a(l + 10e),;5)-median of □ 

We conclude this section with a lemma (similar in nature to Theorem 9.3) that addresses generalized 
range spaces. 

Lemma 9.6 Let {F, Pd) be a function space of dimension d. Let 7 G (0, 1], e G (0, 1/10), 5 G (0, 1/10), 
a > 0. Let S be a random sample of 

d + log ^ 



i.i.d functions from F, where c is a sufficiently large constant that is determined in the proof. Suppose that 
X G X{S) is a ((1 — 5)7, e, a)-median of S, and that \F\ > s. Then, with probability at least 1 — 5, x is a 
(7, 4e, a) -median of F. 

Proof. Let x* be a (7,0, l)-median of F, and for all S" C F let X+{S) = X{S) U {x*}. Notice that 
(F, X~^) is a generalized range space as in Definition 7.2. The number of ranges in X~^{S) is larger by at 
most |5| than the number of ranges in JY{S). Hence, dim(F, X"*") < d + 1. Hence, applying Theorem 6.3 
and then Corollary 7.4 with c large enough, we obtain that, with probabiUty at least 1 — 5, 5 is an 
approximation of F\x+{s)- Assume that this event indeed occurs. By Theorem 8.3, S is also a (7, e)-coreset 
of 

Since X+(S') C X, we have that x is a ((1 — 5)7, e, a)-median of S\x+{s)- Using Theorem 9.3 with 
F = F\x+{s) and S = S\x+{s)' we obtain that x is a (7, 4e, a)-median of F\^x+{s)- Since x* G X~^{S), we 
infer that x is a (7, 4e, a)-median for F. □ 



9.1 Techniques for Computing a Robust Median 

In this section, we use the results of Section 8 to reduce the problem of computing the robust median for 
a set of n points to easier problems on smaller (usually, of size independent of 7i) sets. We assume that 
sampling s functions from F uniformly can be done in time 0{s). Using Theorem 8.4, Theorem 9.3, and 
Corollary 9.4, we get the following corollary. 



34 



Corollary 9.7 Let e G (0, 1/10) 5, 7 € (0,1]. Let F be a set of n > 1 / (e^) functions from X to [0,oo). 
Suppose that we have an algorithm that receives a set S Q F of size 



\S\ = G 



/dim(F) + log (1/(5) 



72^4 



and returns a set Y, \Y\ < (3 that contains a ((1 — 5)7, e, a) -median of S in time SlowMedian. 

Then a (7, 4e, a, /3)-median of F can be computed, with probability at least 1 — 6, in time SlowMedian + 

0{\S\). 

The reduction stated in tiie corollary above (approximately) preserves the quality of the median with 
respect to 7. In cases, it is useful to show a connection between medians for S with 7 = 1 and medians for 
F which arbitrary 7. This point is addressed in the next corollary. 

Corollary 9.8 Let e € (0, 1/4) and 5, 7 G (0, 1]. Let F be a set of n > 1 / (ej) functions from a set X to 
[0, 00). Suppose that we have an algorithm that receives a set S Q F of size 



\s\ = e 



/dim(F) + log (1/(5) 



72^4 



and returns a (1, e, a)-median of S in time SlowOneEpsMedian. Then a (7, 4e, a)-median of F can be 
computed, with probability at least 1 — 5, in time 

Median = O (SlowOneEpsMedian • t • exp {27151 In \S\} ) , 

where t is the time it takes to compute f{x)for a pair f F and x € X. 

Proof. We first compute a ((1 - e)j, e, a)-median z* for S. Let x* be a ((1 - 5)7, 0, a) for 5. Let T* be 
the 7' = [(1 — 5)7] functions / G 5 with the smallest value f{x*). Let y be a (1, 0)-median of T*. Hence, 
cost(r*, y) < cost(r*, X*). Let z be a (1, e)-median of T*. For every x e X, let denote the [(1 - e)-f'] 
functions / G 5 with the smallest value f{x). Therefore, cost(T^, z) < cost(T*, y). 

We compute a (1, e, o;) -median for every set T C S of size 7', and choose z* to be the median that 
minimizes cost{Tz*, z*). Hence, cost(T2., z*) < acost(Tj, z). Combining the last equations yields 

cost(Tj*,z*) < acost(T3,z) < Qcost(T*,y) < acost(T*, x*). 

Hence, z* is a ((1 — 6)7, e, a) for S as desired. 

We compute z* using exhaustive search over all possible < exp {27|S'| In \S\} subsets of size 

|r*| of S. The proof now follows by applying CoroUaiy 9.7 with /? = 1. □ 



10 Centroid Sets 

In this section we define and analyze the notion of a centroid set. Roughly speaking, a centroid set in a 
subset of the centers X that includes a robust median for every subset 5 C F. The notion of centroid sets 
will be later tied to that of weak coresets as outlined in the Introduction. 

Recall that by CoroUaiy 9.8, in order to compute a (7, 4e, a, /3)-median of F for < 7 < 1 in time 
independent in n, it suffices to compute a (l,e, q) median for a small set S in some finite time (even 
exponential in 
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Definition 10.1 Let F be a set of functions from X to [0,oo). A {-f, e, a, f3)-centroid set for F is a set 
cent C that contains as an element a (7, e, a, P)-median of S, for every S F. A (7, e, a)-centroid 
set is a shorthand for a (7, e, a, l)-centroid set. 

We stait with the following simple lemmas that follows directly by our definitions. 

Lemma 10.2 Let F be a set of functions from X to [0, 00). Let a, 13,^ > be parameters. Then, for every 
two parameters 1 > e' > e > a (7, e, q, (3)-median of F is also a (7, e', q, (3)-median of F. 

Lemma 10.3 Let F be a set of non-negative functions, 7 E (0, 1] and e', 7' € [0, 1]. Then every (7, 0, a, j3)- 
centroid set of F is a (7', e', a, (3)-centroid set of F. 

Proof. Let cent be a (7, 0, a, /3)-centroid set for F. Let S Q F. We will show that cent includes a 
(7', e, a, /3) median for S. Then using Lemma 10.2 and Definition 10.1, we can conclude our assertion. Let 
X* be a (7', 0, l)-median of S. Let m = \^'\SW , and let G denote the [{m - + 1 functions f e S with 
the smallest value f{x*). By Definition 10.1 cent contains a (7, 0, a, /3)-median Y for G. Let H denote 
the [7|G|] functions / € 5 with the smallest value f{x*). Let V denote the [7|Gn functions f G S with 
the smallest value f{Y). Hence, 

Cost(y, Y) < aCost{H, x*). (44) 
By denoting a = \G\ — {m — 1) /j, and noting that < a < 1, we have 



\V\ = \H\ = [71^11 



m — 1 
+a 

7 



[m - 1 + 7a] = m = [7'|5'|], 



where in the last deviation we used the assumption 7 > 0. By the previous equation and (44), we have that 
y is a (7', 0, a, /3)-median for S. Using Lemma 10.2, Y is also a (7', e' , a, /?)-median for S. Since the proof 
holds for every 5" C F, we conclude that cent is a (7', e' ,a, /5)-centroid set for F. □ 

Lemma 10.4 Let F be a set of functions from X to [0, 00). Let cent be a (1, 0, a, f3)-centroid set for F. 
For every f (z F define fk as the function that for i < k takes as input x = (xi, • • • , xi) € X^ U • • • U X^ 
and returns fk{x) = mini<i<^ /(xi). Let Fk = {fk \ f & F}. 
For every k-tuple Y = (Yi, • • • , Y^.) G cent'^, let 

U{Y) = {(xi,-- - ,xk),{xk+i,--- ,x2k),---} e {x'^f, 

be a partition ofYi U • • • UYfc into f3 disjoint sets, each of size at most k. Let cent^ = {n(y) | Y € cent^}. 
Then cent^ is a (1, 0, a, f3)-centroid set of size |centfc| = |cent|^/or F^. 

Proof. Let Sk C F^. Let x* = (x*, • • • , x*) G X'' be a (1, 0)-median for Sk, and let T = {/ G F | G Sk} 
be the corresponding functions in F. Let(Ti,--- , Tfe) be apaitition of T, such that Fj = {/ g T | /(x*) = fk{x*)} 
for every 1 < i < k. Fix i, 1 < i < k. Let Yi = {xi, ■ ■ ■ ,x^} G cent be a (1, 0, a, /3)-median for Fj. 
Hence, 

Cost(Fi, Yi) < acost(Fi, x*). (45) 
Let Y = (Yi, . . . Yk) G cent'^. Summing (45) over eveiy 1 < i < k yields 

k k 

Cost(S'fe, n(y)) = > min min/(y) < > Cost(rj,yi) < a > cost(Fj,x*) = acosi{Sk,x*). 

/G5fe - - ^ i=l i=l 

Hence, Yi{Y) is a (1, 0, a, /?) for 5^. Since n(y) G cent^, we conclude that cent^. is a (1, 0, a, /3)-centroid 
setforFfc. □ 
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Algorithm BlCRlTERlA(F, e, a, /3) 

1 i^l-Fi^F 

2 whUe \Fi\ > 10/e 

3 Yi^A (3/4, e, a, /3)-median of 

4 ^ The set of the [(1 - 5e) • 3|Fi |/4] functions f £ Fi with the smallest value f{Yi) 

5 Fi+i ^ Fj \ Gi 

6 i ^ i + 1 

7 ^ A (l,0,a,/3)-medianof Fj 

8 ^ 

9 return {(Gi,yi),- •• ,(Gi,y,)} 



Fig. 5: The algorithm BICRITERIA. (A slight change in the algorithm compared to that presented in the 
Introduction.) 

Lemma 10.5 Let F and F^ be defined as in Lemma 10.4. Let 7 G (0, 1], e € [0, 1), a > 0. Let cent be a 
(1, 0, a)-centroid set for F. Then there is x € cent'^ which is a (7, e, a)-median for F^. 

Proof. Let x* = (xj, • • • , x^) be a (7, 0) -median for F^. Let H}^ denote the [7] functions fk G F^ with 
the smallest value fk{x*). Let G = {/ € F | G ^^A,}- Let {Gi, • ■ ■ , Gfe) be a partition of G, such that 
Gi = {/ G G I fix*) = fkix*)} for every l<i<k. 

For every 1 < i < /c, let G cent be a (1, 0, a)-median for Gj. Hence, cost(Gi, Xj) < acost(Gi, x*). 
Let X = (xi, • • • , Xfc) G cent'^. We thus have, 

k k 

cost(iffc,x) < cost(Gt, Xj) < acost(Gt, x*) = acost(iJ/c, x*). (46) 

i=l i=l 

That is, X is a (7, 0, a)-median for F^. Hence, x is also a (7, e, a)-median for F^. 

□ 

11 From (7, e, a, /?) -medians to bicriteria approximations 

Definition 11.1 (Bicriteria (a, /3) -approximation) Let F bea set of functions from X to [0, 00). An {a, (3)- 
bicriteria approximation /or F is a (1, 0, a, f3)-median of F. 

Let F be a set of n functions from some set X to [0, 00). Recall that for a set X' C X, we define 
cost(F, X') = Y^j^pUiiUx^x' f{x). In this section we present the algorithm BICRITERIA that receives 
a set F of n functions, and parameters e G (0, 1). It returns a set X' C X, \X'\ < log2 such that 
cost(F, X') < (l+e) •miiia.gx cost(F, x). See Fig. 5. The algorithm BICRITERIA uses (calls) the following 
two algorithms: 

• An algorithm that computes a robust-median for a given subset of F; see Definition 9.2 

• A (possibly inefficient) algorithm that receives a set S" C F of size 0(l/e), and returns a set Y such 
that cost(S', Y) < (1 + e) min^.gx cost(5, x). 
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The second algorithm receives an input of size independent of n, and thus can be inefficient. Algorithms 
for computing a robust-median of n functions in time linear in n are presented in Section 9. 1. 



Theorem 11.2 Let F be a set of n functions from a set X to [0, oo), and let a, /3 > 0, < e < 1. Let B be 
the set that is returned by the algorithm BlCRlTERlA(F, e/100, a, (3); see Fig. 5. Then Z = L)(^Q^Y)eB^ 
a ((1 +e)a,/3 log n) -approximation for F. That is, \Z\ < (3 log2 n and 

Cost(F, Z) < (1 + e)a ■ mincost(F,x) . 

Proof. Since |F| is reduced by more than half in each "while" iteration, there are at most log2 n iterations. 
In every iteration we compute Y such that \Y\ < (3, \Z\ < P log n. It is left to bound Cost(F, Z). 

Let B be the set that is returned by a call to the algorithm BlCRlTERlA(F, e, a, /3). We will prove that 

Cost(F, Y)= V Cost(G, y) < (1 + 100e)a • mincost(F, x) . (47) 

iG,Y)eB 

which suffices to prove our assertion. 

For every x ^ X, let Fx denote the [3|F|/4] functions f ^ F with the smallest value f{x). Let x* be 
an item that minimizes cost(F^, x) over all x G X. Fix i, I < i < \B\ — 1. Let F* denote the [3|Fj|/4] 
functions f ^ Fi with the smallest value f{x*). Since Yi is a (3/4, e, a, /3)-median of Fi, we have (by the 
definition of Gi) that 

Cost{Gi,Yi) < acost{F*,x*) < cost{F*,x*) . (48) 

We denote the functions inFhy F = {/i , • • • , /„}, such that fa{x*) < fb{x*) for every I < a < b < n, 
where ties are broken arbitrarily. Let 

Ui = {fl,- ■ ■ , fn-\Fi\} , = |/n-|Fi|+l> ■ ■ ■ , /n-|Fi| + |F*|} ■ (49) 

During the first {i — 1) "while" iterations, an overall of n — \Fi \ functions were removed from F. Hence, 

m U Vi) n F,| > \Ui\ + \Vi\ - (n - \Fi\) = \Vi\ = \F*\. 

We thus have UiUVi 5 F*. The set Vi contains the \Vi\ = \F*\ functions f £ UiUVi with the largest 
values f{x*). Hence, cost(F/, x*) < cost{Vi,x*). Combining (48) with the last equation yields 

Cost{Gi,Yi) < acost{F*,x*) < acost(Vi,x*) . 

By Lines 7 and 8 of the algorithm, we have 

Cost{G\B\,Y\Bi) = Cost{F\Bi,Y\Bi) <a-cost{F\Bi,x*) . (50) 

Let V\B\ = P\B\- Using the last three inequations, we obtain 



^ Cost(G,y) < a ■ cost(F|B|,x*) + ^ cost{Gi,Yi', 

{G,Y)&B 1=1 

\B\ 

< a cost{Vi, X*) . 



(51) 



i=l 
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Let, I < i < \B\ — 1. We now prove that 

\Vi+inVi\<2Ae\V^+i\, (52) 

and that for every integer j such that i + 2 < j < \B\, v/e have 

Vj nVi = (l). (53) 

Indeed, let j be an integer such that i + 1 < j < \B\, and assume Vj H Vi ^ (I). We have \Fj\ = 
\Fi\ — J^t^i \^k\- Using the last equation and (49), we get 

\V, n Vi\ <n-\F,\ + \F*\ -in- \Fj\ + 1) + 1 

(54) 



<lF,l-|F,| + |i^*| = |i^*|-^|Gfc| 



k=i 

We have \Gi\ > (1 — 5e) ■ \F*\ > |i^*|/(l + 6e), where in the last deviation we use the assumption 
e < 1/100 from the beginning of this proof. Hence, 

\F*\ < (1 + 6e)|Gij = \Gi\ + 6e|G,| . (55) 

Since i < |-B| — 1, we have by Line 2 that jFjj > 10/e. We thus have 

|G.|<<i^4^ + l<^<3|F.,,|. 
Using the last two equations, we obtain 

\F*\ < \Gi\ + 6e\Gi\ < \Gi\ + 18e|i^,+i| . 
Combining the last equation with (54) yields 

i-i 

\VjnV,\ < \G^\ + 18e\Fi+l\-Y,\G^\■ (56) 

k=i 

We have > 3|Fi+i|/4, i.e, < 4\F*_^_^\/3. Thus, substituting j = i + 1 in (56) yields 

IV-+1 nVi\< 18e\Fi+i\ < 24e|i^*+il = 24e|y,+i| , 
which proves (52). If j > i + 2, we have by (56) 

\Vj nV^\< 18e|F,+i| - |Gi+i| < 18e|Fi+i| - < , 

which contradicts the fact \Vj n Vj| > 0. Hence, the assumption Vj nVi ^ (I) implies j = i + 1. This 
proves (53). 

Using (53) with (51), we infer that 

\B\ 

Cost(G, y) < a ^ cost(V-, X*) 

(G,y)G-B 1=1 

I \ \B\-l 

= a -cost IJ Vi,x* \ cosi{Vi+ir\Vi,x*) (57) 

\l<j<|_B| / i=l 

< acost(F, X*) + a cost(yi+i n V^, x*) . 

i=l 
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{G,Y)&B 1=1 



We have < 4|i<^*j_]^|/3. The set V^+i n Vi contains the functions / G Fj+i with the smallest value 

f{x*). Hence, Equation (52) implies 

cost(yi+i nVi,x*)< ^.^'^ • cost{Vi+i,x*) < 24e • cost{Vi+i,x*) 

I ^i+l I 

= 24e • cost(yi+i \ V„x*) + 24e ■ cost(Vi+i n Vi,x*). 

That is, 

(1 - 24e) • cost(yi+i n Vi,x*) < 24e ■ cost(Vi+i \ Vi,x*). 
Since e < 1/100, combining the previous equation in (57) yields 

24e • CO 

1 - 24e 

< a ■ cost(F, x*) + lOOea ^ cost(yi+i \Vi,x*) 

1=1 

< a(l + 100e) •cost(F,3;*) , 

where in the last deviation we used (53). This proves (47) as desired. □ 

In what follows we restate Theorem 4.7 and present its proof. 

Theorem 11.3 Let F be a set of n functions from a set X to [0, oo). Let < e, (5 < 1, a, /3 > 0. Then a set 
Z C X of size \Z\ < /3 log2 n can be computed such that, with probability at least 1 — 5, 

cost (F, Z) < (1 + e)a • min cost (F, x) . 

This takes time 

Bicriteria = 0(1) • (nt + log^ n ■ SlowMedian + SlowEpsApprox), 

where: 

• t is an upper bound on the time it takes to compute f{Y) for a pair f & F and Y Q X such that 
\Y\ < p. 

• O(SlowMedian) is the time it takes to compute, with probability at least 1 — 6/2, a (3/4, e, a, f3)- 
medianfor a set F' C F. 

• O(SlowEpsApprox) is the time it takes to compute a (1, 0, a, /3) -median for a set F' Q F of size 
\F'\ = 0{l/e). 

Proof. We present a randomized implementation of the algorithm BlCRlTERlA(F, e, a, (3) in Fig. 5. The 
implementation succeed with probability at least 1 — 5, and its running time is Bicriteria, as stated in the 
theorem. By Theorem 11.2, this proves the theorem. 

Indeed, let B denote the output of a call to BlCRlTERlA(F, e, a, /3). Put i,l < i < \B\. Suppose that we 
have an algorithm MEDlAN(i^ , 5') that computes, with probability at least 1 — 5', a. (3/4, e, a, /3)-median Yi 
for Fj. Calling to Median (i^j, 5/ log n) in each of the 0(log n) times that Line 3 of the algorithm Bicri- 
teria is executed, would yield an implementation for Bicriteria that succeeds with probability at least 
1 — 5. However, in this implementation, we use 5' that is dependent of n. 
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Instead, in order to compute Yi, we call i times to MEDlAN(Fj, S/2), and denote by xi, • • • , Xj the 
returned sets. Note that, here, each Xi is a subset of size /3 from X. For each such set Xj, I < j < i, let 
Gj denote the [(1 — 5e)3|Fj|/4] functions f G F with the smallest value f{xj). Let {Gi,Yi) denote the 
pair that minimizes cost{Gj,Xj) over (Gi, xi), • • • , {Gi,Xi). The algorithm then continue to Line 4 of the 
algorithm BICRITERIA using this construction of Yi and Gj. 

The probability that Yi is a (3/4, e, a, /3)-median of Fi is at least the probability that one or more of 
the items xi, • • • , Xj contains a (3/4, e, a, /3)-median of Fi. Hence, Yi is a (3/4, e, a, /3)-median of Fi with 
probability at least 1 — (5/2)*. By Theorem 11.2 there are at most \B\ < log2 n iterations. Hence, the 
probability that the item Yi would be a (3/4, e, a, /3)-median in the ith iteration, for every i, 1 < i < is 

at least 1 - ^1=?"^ ('^/2)* > 1 - 5- 

The running time of the ith iteration of the algorithm BICRITERIA is dominated by the above imple- 
mentation of Line 3. By the assumption of the lemma, each of the i calls to Median (i^j, J/2) takes 
O(Slow^Median) time. The computation of Gj for every 1 < j < i takes overall of 0{ii\Fi\) time 
using order statistics (). Since the size of F is reduced by more than half in each "while" iteration, the 
running time of Line 3 over all the 0(log n) iterations is therefore 

log2 n log2 n 

^ O -ii + i- SlowMedian^ < 0(nt) • ^ + 0(log^ n) ■ SlowMedian 

i=l i=l 

= 0{nt + log^ n ■ SlowMedian) . 

By the assumption of this theorem. Line 7 can be computed in time SlowEpsApprox. We conclude 
the that the total running time of the above implementation for BlCRlTERlA(F, e, a, /?) is Bicriteria as 
desired. □ 



12 Applications: Bicriteria for Projective Clustering 

In this section we present several applications of the Theorems presented in Section 1 1 addressing bi-criteria 
approximation. Our applications ai^e from the context of projective clustering. We consider several settings 
of parameters. For each setting we prove appropriate results. We start with some notation. 

12.1 Notation 

For a point p G M'^ and a set Q Q W^, we define dist(p, Q) = min^gQ ||p — ^ll- More generally, for an 
m-tuple X = (xi, ■ ■ ■ , Xm) of subsets of M'^, we define 

dist(p, x)= min dist(p, Xj)= min min ||p — g|| . 

l<j<m l<i<mqGXi 

We denote by proj(p, Q) the point q G Qi such that dist(p, Q) = \\p — q\\, where ties are broken arbitrarily. 
The span of Q (i.e., the affine subspace containing all points in Q) is denoted by span {Q). A j-flat in R'^ is 
a translated (affine) {j — 1) -dimensional subspace of M.'^. For example, a 1-flat in M*^ is a set that consists of 
a single point. 

Let j,k > 1 be two integers. Let X{j, 1) denote the set of all possible j'-flats in M'^, 1 < j' < j. 
Let X{j, k) = lJm=i (^(ii l))*" be the union of tuples, where each tuple contains at most k flats, each 
of dimension at most (j — 1). Let P be a set of points in W^. For every point p € M'^, we define the 
corresponding function fp : X{j,k) — > [0, oo) to be fp{x) = dist(p, x), where x = (xi,--- ,Xm)- We 
define F{P,j, k) = {fp \ p G P} to be the union of these functions. For every set 5 C F{P,j, k), we 
denote Ps = {p e P \ fp e S}. 
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For X = {xi, - ■ ■ , Xm) G k), we define cost(P, x) = J2p£P dist(p, x). For a set of tuples, {yi}i = 
Y C X{j,k), we define cost(P, y) = ^^^p minj dist(p, i/j). Hence, cost(P, x) = cost{F{P,j,k),x) 
and cost(P, Y) = Cost(F(P, j, k), Y). 

12.2 a = 2^ Small j, and A; 

We start by showing how one can obtain an (a, /? log n) bi-criteria approximation in which the approxima- 
tion ratio a is rather large, and the resulting /3 and running time are of size exponential in j and log k. Our 
proof has the following structure. 

To apply our generic algorithm for bi-criteria approximation, one must (iteratively) find robust medians 
for given subsets of F. Essentially, this is done via random sampling. Namely, as we have shown, for 
any such F' C F, taking a sufficiently lai^ge sample S, a robust median for S is also one for F'. To find 
a j-subspace that acts as a (1, 0, a)-median for S efficiently, we show that one does not have to consider 
all j-flats in W^, but rather only those spanned by j points of S. This effectively allows us to consider a 
generalized rage space corresponding to F{P,j, k) of dimension 0{jk) (instead of the naive dimension of 
{dk)), which determines the size of the random sample 5 to be independent in d. Hence, using such small 
random samples S, and exhaustively computing for them a robust median will yield our result. A detailed 
proof follows. 

Theorem 12.1 ([FFSS07]) Let P be a finite set of points in W^. Let < j < d. There is a set M C P, 
\M\ < j, and aflat x = span (M) such that, 



Theorem 12.2 Let P be aflnite set of points in W^, and I < j < d + 1. Let S C F{P,j, k), 

X{S) = {x G X{j, I): x = span (M) , M C Pg, \M\ < j} , 
andXk{S) = {X{S)f. Then 

(i) X{S) is of size 0{\S\^), and can be computed in 0{dj'^) ■ \S\^ time. 

(ii) d\m{F{P,j,k),Xk) = 0{jk). 

(Hi) Xk[S) is a (1, 0, 2^)-centroid set for S. 

Proof, (i) There are [-^(S")! = 0(|5p) subsets of size at most j of S. For a fixed subset Q of \Q\ < j 
points from S, we use the QR decomposition in order to compute the flat that is spanned by them. This takes 
0{df) time. 

(ii) We prove the case A; = 1. The case k > 1 then follows from Lemma 6.5. Fix x € X{S). For r > 0, let 
range(5', x, r) = {/ € S j f{x) < r}. Hence, | { range ( S, x, r) | r > 0} j < Therefore, 

\{range{S,x,r) \ x e X{S),r > 0} | < 0(|SP • \S\) = \S\^^^l 

By our definitions, we obtain dim{F{P,j, 1), X) = 0{j) as desired. 

(iii) Follows from Lemma 10.4 and Theorem 12. 1. □ 

Lemma 12.3 Let P be aflnite set of points in W^, and j,k > 1 be two integers. Let 6,e ^ (0, 1/10), 
7 € [0, 1], and 



Then, a (7, e, 2^ , 0{s^)/k) -median for F(P,j, k) can be computed, with probability at least 1 — 5, in time 



cost(P,x) < 2^ min costfR x*) 




(58) 



0(ds2) + sOU\ 
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Proof. Let F = F{P,j, 1), = F{P,j, k) and Xk be defined as in Theorem 12.2. Let Sk be a random 
sample of c • s i.i.d functions from Fk, where c is a sufficiently large constant that will be determined later 
in the proof. Here, we assume that \Fk\ > c • s, otherwise we set Sk = Fk- Without loss of generality, we 
assume that the points in P corresponding to Sk are in M'"^'- !, otherwise we compute an orthogonal base for 
these points in 0{ds^) time using the QR decomposition. 

Let 5 = {/ e F I /fc G Sk}. By applying Theorem 12.2 with /c = 1, a (1, 0, 2^ )-centroid set X{S), 
\X{S)\ = 0(s-'), for iS* can be computed in time 0((ij^)-s-'. By applying Lemma 10.5 with F = S,Fk = Sk, 
cent = X{Sk), e/4 and (1 - 5)7 there is a ((1 - e/4)7, e/4, 2J )-median x G {X{Sk))^ = Xk{Sk) for 
Sk- Applying Lemma 9.6 with the function space {Fk, X^j yields that with probability at least 1 — x is a 
(7, e, 2-' ) -median of Fk- 

Let F be an arbitrary partition of ,^fc(5'fc) into /3 = \\Xk{Sk)/k'\ sets of size at most fc. Since x G Xk{Sk) 
we have cost{Fk,V) < cost{Fk,x). Since a; is a (7, e, 2-')-median of Fk, the last equation implies that V is 
a (7, e, 2^ , /3)-median of F^. 

□ 

Theorem 12.4 Let P be a finite set of points in W^, and j, k > 1 be two integers. Let 6 G (0, 1/10), and let 

s = jk + log^. 

A (2-'"^^, s^^^^k"^ log n)-bicriteria approximation for F{P, j, k) can be computed, with probability at least 
1 — 5, in time 

Bicriteria = 0{nds°^^^) + 0{ds^ log^ n) + s^^^) log^ n = 0{nds'^^^^). 

Proof. By Lemma 12.3, a (7, 1/2, 2^ , s'^^^^ /k)-median for a set F' C F{P,j, k) can be computed, with 
probability at least 1 — 5/2, in SIowMedian = 0{ds'^) + s'-"^^^ time. Similarly, using 7' = 1 and e' = in 
the proof of Lemma 12.3, a (1, 0, 2^ 2'^(j) /A;)-median for a set F' of size = 0(1) can be computed in 
SlowEpsApprox = \Xk \ = 0{d) + 2^^^^ time. The time it takes to compute the distance between a point 
to a set of s^^^^ j-Hats is t = 0{ds^^^^). By applying Theorem 11.3 with e = 1/2, and /3 = k~^s'-^^^\ we 
infer that a {2^ , k~^s^^^^ log n)-bicriteria approximation for F{P,j, k) can be computed, with probability 
at least 1 — 5, in time 

Bicriteria = 0(1) • (nt + log^ n • SIowMedian + SlowEpsApprox) 

= 0(nds^(j)) + 0{ds^ log2 n) + s^^^^ log^ n = 0{nds^^^^) 

□ 

12.3 a = 1 + e, Small j and k 

We now address an (a, /3 log n) bi-criteria approximation in which the approximation ratio a is small. Our 
proof follows a similar structure to that given in the previous case of a = 2^ . The main difference here is 
that we need to present an efficient way to find an (1, 0, 1 + e)-median for random samples S of F. We first 
show, as before, that one need not consider all j-fiats in W^, but rather only j-flats contained in the span of 
approximately jk/e points in S. As there are infinitely many such j-flats, this will not suffice for our needs, 
and thus we turn to discretize the set of potential medians to obtain a final set of potential medians of size 
roughly |5p'^/^. Once our potential set of medians (i.e., our centroid set) has been established, we continue 
as we did in the previous section. A detailed proof follows. We start by presenting a few known assertions. 
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Theorem 12.5 ([SV07]) Let P be a set of points in W^, 1 < j < d, and let < e < 1/4. Then there is a 
set M C P, \M\ < and a j-flat x C span (M) such that 

cost(P,x)<(l + e) mill cost(P,x*) 

x-ex(j,i) 

Lemma 12.6 ([SA95, FMSWIO]) Let P be a set of points in Mf^, and j,k > 1 be two integers. Then 

(i) dim{F{PJ,k)) = 0{dk). 

(ii) A (1, 0, l + e)-centroid set C for F{P, j, k) of size \C\ = nC(d?*:iog(i/e)) can be constructed in 0{\C\) 
time. 

We now present a technical lemma that we will use in our proofs to come. 

Lemma 12.7 Let Q be an m-dimensional subspace ofW^, and let Q' be an {m + 1) -dimensional subspace 
that contains Q. Put p € M'^. There is a point p' € Q' such that for every j, 1 < j < m — 1, and a j-flat 
X Q we have 

dist(p, x) = dist(y,2;). 
Moreover, p' can be computed in 0{d) time. 

Proof. Letp G M"^. Let p' G Q' , such that proj(p',(5) = proj(p, Q) and disi{p' ,Q) = dist(p,Q). The 
point p' can be computed by projecting p on Q and then translate it in a direction that is orthogonal to Q. 
Let X C Q be a j-flat. By the Pythagorean Theorem and the construction of p' , for every q £ Q v/e have 

\\P - q\\ = \l (dist(p,Q))^ + ||proj(p,Q) - q\\^ = \J (dist(p', Q)f + ||proj(p', Q) - qf = \\p' - q\\ . 
Since x C Q, we have by the last equation that dist(p, x) = miiigga; ||p — (?|| = dist(p', x) as desired. □ 

The following is a generalization of Theorem 12.2(i). 

Lemma 12.8 Let P be a finite set of points in M"'. Let m > j > 1 and k > 1 be integers. For every set 
S C F{P,j,k), let 

X{S) = {x € X{j, 1) : X C span (M) , M C Pg, \M\ < m} , (59) 
andXkiS) = {X{S)f. Then dim{F {P, j , k) , Xk) = 0{mk). 

Proof. We prove the case k = \. The case > 1 follows from Lemma 6.5. Put S C F{P,j, 1), M C Pg 
such that \M\ < m, and Q = span (A/). Let Xq = {x G X{j, 1) | x C Q} denote all the flats of dimension 
at most j that are contained in Q. Let Q' be an (m + l)-subspace that contains Q. By Lemma 12.7, for 
every p £ Ps there is a point p' G Q' such that 

dist(p, x) = dist(p', x) for eveiy x G Xq. (60) 

For every p £ P, define fp' : Xq [0, oo) to be /(x) = dist(p', x). Let S' = {fpi | p G Ps} be the union 
of these functions. 

Since both Ps' and the flats of Xq are contained in the (m + 1) -dimensional subspace Q', applying 
Lemma 12.6(i) with d = m -\- 1 implies that dim(5') = 0{m). By definition of dim(-), we obtain 

I {range(S',x,r) | x G XQ,r > O} | < < (61) 
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By (60), for every r > 0, x G Xq and a set range(5', x, r) = {/ € S | f{x) < r} = fp-^, fp^, ■ ■ ■ there is 
a corresponding distinct set: range(S'', x, r) = {f £ S' \ f{x) < r} = fpt^, fp'^, - ■ ■ . Therefore, 

I {range(5, X, r) | x € Xq^r > 0} | = | {range(5', x, r) | x € Xq^r > O} |. 

Using the last equations with (61), we obtain 

I {range(5, X, r) | x G XQ,r > 0} | < |5p(™). 

Taking the union over every possible choice of Q yields 

IJ I {range(5,x,r) | x G XQ,r > 0} | < \Ps\°^"'^ ■ = \Sf^'^\ 

QG{span(M): M<ZPs,\M\<m} 

Using (59) with the last equation yields 

I {range(5, x, r) | x G X{S),r > 0} | < |J | {range(5, x, r) | x G Xq,r > 0} | 

QG{span(M): MCPs,\M\<m} 

By our definitions, we obtain dim{F{P,j, 1), X) = 0{m) as desired. □ 

Theorem 12.9 Let P be a finite set of points in W^, and k > 1 be an integer. Let S C F{P, j, k), 

X{S) = |x G X{j, 1) : X C span (M) , M C Pg, \M\ < ^og{l/e) | ^ ^^^^ 

andXk{S) = {X{S)f. Then 
(i) 

dim(F(P,,^fc),^,0 = o(^^iMlM). 

(ii) Xk{S) is a (possibly infinite) (1, 0,1 + e, \)-centroid set for S. 
Proof, (i) Follows from Lemma 12.8. (ii) Follows from Lemma 10.4 and Theorem 12.5. □ 

The following centroid set that is constructed using the bound of Theorem 12.6 is similar to the larger 
and somewhat less general centroid set that is constructed in [DRVW06]. 

Lemma 12.10 Let P be a set points in W^. Let S C F{P,j, k), and let e G (0, 1). A (1, 0, 1 + e)-centroid 
set C for S can be computed in 0{d ■ \S\'^ + \C\) time, where 

\C\ = |5|0(i'fcios'(i/£)/e). 
Moreover, C C Xk[S), where Xk{S) is defined in Theorem 12.9. 

Proof. We prove the case k = 1. The case k > I follows by applying Lemma 10.4 with F = S and 
(3 = 1. Let e' = e/3, m = lOj log(l/e')/e'> and M Q Ps such that \M\ < m. Let Q = span(M), 
Xq = {x G X{S) I X C Q}, and let Q' be an (m + l)-subspace that contains Q. By Lemma 12.7, for every 
p G Ps there is a point p' G Q' such that 

dist(p, x) = dist(p', x) for every x G Xq. (63) 
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For every p e P, define fp/ : Xq [0,oo) to be f{x) = dist(p',x). Let Sq = {/p/ | p G Ps} 
be the union of these functions. Substituting P = Sq and d = m + 1 in Lemma 12.6(ii) yields that a 
(1,0, 1 + e')-centroid set Cq for Sq of size \Cq\ = \S\0('>^nog{i/e)) ^an be computed in oI\Cq\) time. 
By (63), Cq is also a (1, 0, 1 + e')-centroid set for S\Xq- Let ^^(5) = |Jq Xq where the union is over 
every Q = span (M) such that M C Pg, \M\ < m. Hence, C = [Jq Cq is a (1, 0,1 + e')-centroid set 
for S\Xf,{s)- By Theorem 12.9(ii), <Ya,(5) is a (1, 0, 1 + e')-centroid set for S. Hence, by definition, C is a 
(1, 0, (1 + e')^)-centroid set for S. Since (1 + e')^ < 1 + 3e' < 1 + e, C is a (1, 0, 1 + e)-centroid set for 
S, as desired. 

The size of C is 



|p^|0{m) .\Cq\ = |5|C{i'log2{l/e)/£)^ 



For the running time, we may compute a base for span (S) using, for example, the QR decomposition in 
time, and then compute C on the 1 5 j -dimensional space. □ 

Lemma 12.11 Let P be a finite set of points in W^, and j,k > 1 be two integers. Let S,e £ (0, 1/10) and 
7 € (0, 1]. A (7, e, 1 + £)-median for F{P,j, k) can be computed, with probability at least 1 — 5, in time 
0{ds'^) + sOU'klogHl/e)/e)^ ^^^^^ 

Proof. Let 5 be a random sample of c • s i.i.d functions from F, for some constant c > 1 that will 
be determined later. Here, we assumed that \F\ > c • s. Otherwise, let S = F. By Lemma 12.10, a 
(1, 0, 1 + e)-centroid set C for S can be computed in 0(|C| + ds'^) time, where 

|(;| ^ ^0(i2fciog2(i/£)/£)^ 

By Lemma 10.3, C is also a ((1 — 5/4)7, e/4, 1 + e)-centroid set for S. Using exhaustive search over 
C, a ((1 — e/4)7,e/4, 1 + e)-median x G C of S can be computed in 0{ds'^ + |C|) time. Let Xk{-) be 
defined as in Theorem 12.9. By Theorem 12.9, ^'^(5) is a (1, 0, l + e)-centroid set for S, and dim(F, X^) = 
0{jklog{l/e)/e). By Theorem 12.10, C is contained in^fc(5),sox G Xk{S). By Theorem 9.6, for a large 
enough constant c we have that, with probability at least 1 — (5, 2; is a (7, e, 1 + e)-median for F{P, j, k). □ 

Theorem 12.12 Let P be a finite set of points in W^, and k,j>l be two integers. Let 6 G (0, 1/10) and 

Then a (1 + e, log n)-bicriteria approximation for F{P,j,k) can be computed, with probability at least 
1 — 5, in time 

Bicriteria = 0{ndjk) + 0{dr^) + r^ifkiog^iM/e) ^^^2 ^_ 

Proof. By applying Lemma 12.11 with 7 = 3/4 and 5/2, a (3/4, e, 1 + e)-median for a set F' C F{P,j, k) 
can be computed, with probability at least 1—5/2, in SlowMedian = 0{dr'^)+r^^^ (1/^)/^) time. For 
a set S C F{P,j, k), \S\ = 0{l/e) < r, a (1,0, 1 + e)-median x of S can be computed in Slow^Median 
time using exhaustive seaixh on the centroid set in Lemma 12.10. By applying Theorem 11.3 with /3 = 1 
and t = djk, a (1 + e, log n) -bicriteria approximation for F{P,j, k) can be computed, with probability at 
least 1 — 5, in time 

Bicriteria = 0{ndjk) + Oidr"^) + r.O(i'fcios'(i/^)A) iog2 n. 

□ 
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12.4 a = 1 + £, Large k, Small j 



Lemma 12.13 Let P be a finite set of points in R'^, and j,k > 1 be two integers. Let 5, e S (0, 1/10) and 
7 G (0, 1]. Let /3 = s©(i'ios'(i/e)A), where 



Then a (7, e, 1 + e, fi) -median for F{P, j, k) can be computed in 0{ds^ + kj5) time. 

Proof. Let = F{P,j,k) and F = F{P,j, 1). Let Sk be a random sample of c • s i.i.d functions 
from Fk, for some constant c > 1. Here, we assumed that \F\ > c ■ s. Otherwise, let 5^ = F^. Let 
S = {f e F \ fk e Sk}. By applying Lemma 12.10 with k = 1, a (1,0, 1 + e)-centroid set X{S) for S, 
\X{S)\ = kp, can be computed in 0{ds^ + kji) time. Applying Lemma 10.5 with F = S, F^ = Sk, yields 
that there is x G (-^(5))'= which is a ((1 - e/4)7,e/4, 1 + e)-median for Sk. Let Xk{Sk) = {X{S))^. 
Applying Lemma 9.6 with the function space {Fk,Xk) yields that with probability at least 1 — J, 2; is a 
(7, e,l + e)-median of Fk. Assume that this event indeed occurs. 

Let y be an arbitrary partition of Xk{Sk) into /3 = \\Xk{Sk)/k~\ sets of size at most k. Since x G Xk{Sk) 
we have cost(Ffc, V) < cost{Fk, x). Since x is a (7, e, 1 + e)-median of Fk, the last equation implies that 
y is a (7, e,l + e, /3)-median of Fk. □ 

Theorem 12.14 Let P be a finite set of points in W^, and j,k > 1 be two integers. Let e,6 £ (0, 1/10), 



Proof. Let (3 = r^if^og^i'^/e)/e)/j._ gy applying Lemma 12.13 with 7 = 3/4 and 6/2, a (3/4, e, 1 + 

e, /3/fc) -median x for a set F' C F{P, j, k) can be computed, with probability at least 1—5/2, in Slovi^Median = 

0(dr2 + kl3) time. 

For a set 5" C F{P,j,k), \S\ = 0{l/e), a (1,0, 1 + e)-centroid set X{S) for S, \X{S)\ = kf3, can 
be computed in 0{dr'^ + /3) time using Lemma 12.10. Applying Lemma 10.5 with F = S, Fk = Sk, 
7 = 1 and e = yields that there is x G {X{S))'^ which is a (1,0, 1 + e)-median for Sk. Hence, an 
arbitrary partition V of (Af (5))^^ to A;-tuples is a (1, 0, 1 + e, /3/A:)-median for Sk that can be computed in 
Slov^EpsApprox = 0{kf3) time. 

The time it takes to compute the distance between a point to a set of /5-flats is t = 0{d/3). By Theo- 
rem 11.3 a (1+e, (3k^^ log n)-bicriteria approximation for F{P, j, k) can thus be computed, with probability 
at least 1 — (5, in time 





and (3 = r^(i^^'^°^ Then a (1 + e,/3A; ^ log n)-bicriteria approximation for F{P,j,k) can be 

computed in time 



Bicriteria = 0{ndl3) + Oidr"^ log^ n) + r^(^' i°g'(V£)/e) iog2 



Bicriteria = 0{ndl3) + 0{dr'^ log^ n) + k(3 log^ n. 



□ 
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12.5 a = 1 + E, Large j and k 

Lemma 12.15 Let P be a set of n points in W^, and j,k > 1 be two integers. Let (5, e G (0, 1/10) and 
7 € (0, 1]. Let S be a random sample of 

c ffklog^il/e) , 1 
' ^ ' ^ ' log - 



i.i.d functions from P, where c is a sufficiently large constant that is determined in the proof. Let 

Y = {{xi, ■ ■ ■ , xj^) e X{j, k) \ Xi CI span {S)for every 1 <i <k} . 
Then, with probability at least 1 — 5,Y is a (7, e, 1 + e, oo)-median for F{P, j, k). 

Proof. Let be defined as in Tiieorem 12.9. By Theorem 12.9(ii), Xk{S) is a (1, 0, 1 + e)-centroid set for 
S. Let 7' < 1 and e' > 0. By Lemma 10.3, Xk{S) is also a (7', e' ,1 + e)-centroid set for 5. Hence, there is 
a (7', e' , 1 + e)-median x € Xk{S) for S. Since Xi^^S) C Y, we have that y is a (7', e', 1 + e, oo)-median 
for S. 

For e' = e/A and 7' = (1 - e/4)7, there is a ((1 - e/4)7, e/4, 1 + e)-median x € Xk{S) for S. By 
Theorems 12.9(i), we have dim(F(P, j, k).,Xk) < j'^k\o^{l/e)/e. Using tiiis with Theorem 9.6, we infer 
that there is a constant c such that, with probability at least 1 — 5, x is a (7, e, 1 + e)-median for j, A;). 
Assume that this event indeed occurs. Since x G A'^ C y, we have that y is a (7, e, 1 + e, oo)-median for 
F{P,j,k). □ 

Theorem 12.16 Let P be a finite set of points in W^, and k > 1, j > 1 be two integers. Let e,5 £ (0, 1/10) 
and 

1 ( fk\og\l/e) , ^ 1 

Then an 0{r log n)-dimensional subspace Z ofW^ that satisfies 

cost(P, Z) <{l + e) min costfP, x*) 

x'GX(j,k) 

can be computed in time 

Bicriteria = 0{ndr) + 0{dr^ log^ n). 

Proof. By applying Lemma 12.15 with 7 = 3/4 and 5/2, a (3/4, e, 1 + e, oo)-median y of a set F' C 
F{P, j, k) can be computed, with probability at least 1 — 5/2, such that all the /e-flats of Y are contained in an 
0(r)-flat. For a set F' of size 0(l/e), the span of (the points corresponding to) F' contains a (1, 0, 1 + e, 1)- 
median of F' . 

By definition of Y, for every p € M'^ we have dist(p, y) = dist(p, span (5)). After computing an 
orthogonal base for S in 0{dr^) time, the time it takes to compute dist(p, y) is t = 0{dr). By Theorem 11.3 
an 0{r log n)-flat Z that, with probability at least 1 — 5, satisfies 

cost(P, Z) <{l + e) min cost(P, x*) 

can be computed in time 

Bicriteria = 0{ndr) + 0{dr^ log^ n). 

□ 
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12.6 fc-Median in a Metric Space 

Theorem 12.17 Let {P, dist) be a metric space ofn points. Let k > 1 be an integer, e > and 

A set B C PofO{(3 log n) points can be computed in 0{ndk + log^ n/3) time such that, with probability at 
least 1 — 6, 

cost(P, B) < {2 + e) ■ mill cost(P, x). 

Proof. For every p e P, define fp : P'' [0, oo) to be fp{x) = dist(p, x). Let F = {fp | p G P}. 
For every set S C F that corresponds to 5 C P, let Af (5) = 5'^. For every x € -^(5') and r > 0, let 
ranges(5, X, r) = {/ G 5 | /(x) < r}. Hence, 

|ranges(5)| = | {range(S, x, r) | x G X{S),r > 0} | < j^l'^ • \S\ < 

so dim(F,A') = 0{k). 

Let 7 = 3/4, e G (0,1/10), a = 2. Let F' C F. If > /3, let S" be a random sample of 
/3 i.i.d functions from F'. Othersise, we define S = F' . Let x* = (x^, • • • be a (7,6, l)-median 
for S. Let y = (yi, • • • ,yfc) G 5^^', such that yi is the closest point to x* in 5, for every 1 < i < k. 
Let 5^:* denote the closest [(1 — e)7|S'|] points of S to x* . Fix p G 5^*, and let Xp denote the closest 
point in x* to p. By the triangle inequality, dist(p, y) < dist(p, Xp) + dist(xp,y), and by definition of y, 
dist(xp,y) < dist(xp,p). Hence, dist(p, y) < 2dist(xp,p) = 2dist(p, x*). Summing over every p G Sx* 
yields cost(iSa;* , y) < 2cost(5, x*). By the last inequality, y is a ((1 — 6)7, e, Q)-median of S. Since y G S^, 
we have that S contains a ((1 — 6)7, e, a)-median of S. 

If > 1/e, by applying Corollary 9.7 with F = F' and y = 5 we can compute, with probability at 
least 1 - 5/2, a (7, 4e, a, /3)-median of F' in time 0{\S\) = 0{fi). If |F'| < 1/(75), the set 5 = F' is a 
trivial (1,0, a, /3) median for F' . 

Applying Theorem 11.3 with X = P^, t = d, SlowMedian = 0{f3), SlowEpsApprox = 
SlowMedian = 0{f3), t = 0{k) and 16e yields that a set Z C F, \Z\ < kj3 log-^ n can be computed such 
that, with probability at least 1 — 5, 

cost(F, Z) <{l + e/2)a- min cost(F, x) < (2 + e) • min cost(F, x) 

in time 

Bicriteria = 0(1) • {ndk + log^ n/3). 

□ 

13 From bicriteria to 5-coresets 

In this section we analyze the quality of the coresets obtained via algorithm B-CORESET (Figure 6). We 
present of analysis which will be used in sections to come when we derive results for specific clustering 
problems. 

Theorem 13.1 Let F be a set of functions from X to [0, 00], and < e < 1/4. Let s : (F, X) — )■ [0, 00), 
and m : F — )• N \ {0}. For each f ^ F let f be a corresponding function associated with f, and let 
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Algorithm B-Coreset(F, F', s, m, e) 

1 For each f ^ F, let tf : X ^ [0, oo) be defined as: tf{x) 

2 LetT = {tf \ f £ F}. 

3 For each f G F let gj : X ^ [0, oo) be defined as: gf{x) 

4 Let G J consist of the m J copies of (7j. 

5 G ^ U/gf Gf- 

6 S ^ An e-approximation of G. 

8 return C ^ T U C/. 



/'(x) f'{x)>Sf{x) 
otherwise 







f'{x) > Sf{x) 
otherwise 



Fig. 6: The algorithm B-CORESET. 



F' = {f'\f e F}. For every x e X, let M{x) = {f e F : f'{x) < Sf{x)} and assume f{x) < 2sf{x)for 
every f e M (x). Then for C = B-Coreset(F, F' , s, m, e) it holds that 



Vx € X :|cost(F,x) -cost(C,2;)| < ^ - /'(x)| + 2e 



max 



Sf{x) 



feF\M{x) 



feMix) rrif 



E 

/6F 



mr. 



Proof. Fix X ^ X, and let M = {/ G F : f'{x) < Sf{x)}. For every / G M, we have cost{Gf,x) 
nif ■ gf{x) = f{x). Moreover, by definition, for every f ^ M \Ne have cost(G/, x) = 0. Hence, 

cost(F, x) = cost(F \ M, x) + ^ /(x) = cost(F \ M, x) + cost(G, x). 

/GM 

The first term in the right hand side is approximated by T, up to an error of 



(65) 



|cost(F \ M, x) - cost(r, x) 



{fix) -fix)) 

f£F\M 



< E |/(^)-/'(^)|- (66) 

f£F\M 



Since 5 is a e-approximation of G, by Lemma 6.8 we obtain 

cost (G, x) cost (5, x) 



\G\ 



\S\ 



< e ■ max gf{x) . 

gf&G 



By Step 3 of our algorithm, for every gj £ G,^^ have /'(x) < s/(x). By the assumption /(x) < 2s{f ) of 
the theorem, we thus obtain 



By the last two equations. 



cost(G, x) cost(5, x) 



m 



f 



\G\ 



\S\ 



< e ■ max 



2sf(x) 



f&F mj 
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Multiplying this equation by |G| yields 

2g/(x) 
/gF rrif 

Recall that U = {gf ■ \G\/\S\ \ gj G S}. Together with the previous two inequalities, we obtain 



cost(G, x) — -j— j- • cost(S', x] 



< e|G| • max ■ 



|cost(G, x) — cost(f/, x) 



\G\ 

cost(G, x) — -— • cost(5, x] 



<e|G| -max^^^. (67) 

feF ruf 



We have cost(C, x) = cost(T, x) + cost(C/, x). Hence, combining (66) and (67) with the triangle 
inequality yields 

|cost(F \ M, x) + cost(G, x) — cost(C, x)| = |cost(F \ M, x) + cost(G, x) — cost(T, x) — cost(L'', x)| 

< |cost(F \ M, x) — cost(T, x)| + |cost(G, x) — cost(C/, x)| 

< E |/(^)-m|+e|G|-max^^. 

Using (65), this proves the theorem, as 

|cost(F, x) — cost(C, x)| = |cost(F \ M, x) + cost(G, x) — cost(C, x)| 

< y |/(x)-/'(x)|+e|G|-max^^^ 

feF\M ^ 

= E |/(-)-/'(x)|+.max^j: 
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□ 



We now present a few corollaries of Theorem 13.1 that will be used in the sections to come. 

Corollary 13.2 Let F,X, s, M and e be defined as in Theorem 13.1. Let 6 > 0. Suppose that for every 
X € X and f G M(x) we have 

^ s/(x) 

mt > , ^ — ^, 

5-cost(F, x) 

and, for every f £ F \ M(x), 

\fix)-f'{x)\<ebf{x) 
Then for C = B-Coreset(F, F' , s, m, e) it holds that 

Vx G X : |cost(F, x) - cost(C, x)| < e6cost(F, x) 1 + 2 ^ m/ 

Proof. Put X G X. For every / G M(x), we have 

^ sf{.) = x) ^ mj. 

/eF bcost{F,x) /eF /GF 

For every / G F \ M{x), we have 

l/(^)-/'(^)l <^ E bf{x)<ebcost{F,x). 

feF\M{x) /GF\Af(x) 

The Corollary follows by applying Theorem 13.1 using the last inequalities. □ 



51 



Corollary 13.3 Let F, X, F' , s and e be defined as in Theorem 13.1. Let B ^ X and r > 0. Suppose that 
for all X ^ X and for all f ^ F it holds that 



fix) > 



f{B) 



\f{x)-f'ix)\<e-f{x). 



(68) 



For every f € F and x £ X assume Sf{x) = f{B) /r and define 



rrif 



+ 1. 



cost{F,B) 

Then for C = B-Coreset(F, F' , s, m, r^) it holds that 

yx £ X : |cost(F,x) - cost(C,2;)| < ecost{F,x) + 4rcost(F, S). 
Proof. Put X e X, M{x) = {/ G F I f'{x) < Sf{x)}, and f e F.lf f e M{x), then using our definitions 

Sfjx) _ f{B) ^ cost{F,B) 



m 



f ™f 



\F\t 



Otherwise, / M{x). Thus f'{x) > Sf{x) = f{B)/T, so, by (68), \ f{x) - f'{x)\ < e ■ f{x). Replacing e 
with in Theorem 13.1 yields 



|cost(F,a;) -cost(C,x)| < ^ - + 2r 



feF\M(x) 



2 Sf{x 
max — 

/eAf (x) m f 



< ecost(F, x) + 4rcost(F, B). 



□ 



Corollary 13.4 Let F, X, F' and e be defined as in Theorem 13.1. Let B X. For f ^ F, let nif be an 
arbitrary positive value, and let Aj = ■^'^f_^^^^'^) _ Suppose that for all x ^ X and for all f ^ F it holds 
that 

\f{x)-f'{x)\<Af. (69) 

For every f £ F and x X, let hf(x) = f{x) — f'{x) + and H = {hf \ f G F}. For every hj £ H, 
let = hf and nifi^. = lUf. Then for C = B-CORESET(i?, 0, s, m, e) it holds \/x £ X that: 



\cosi{H,x) — cost(C, x)| 



cost(F, x) — cost(F', x) + cost(C, x) — A f 



< 12ecost(F,B) 



Proof. Let x £ X. We have 

cost(-F, x) = cost(F', x) + cost(F, x) — cost(F', x) 
= cost(F', x) + cost(if, x) — Af. 



52 



Hence, 



cost(F, x) - ^cost(F', x) + cost(C, x) - ^ A/^ 

By applying Theorem 13.1 with F = F' as H, we infer that 

\cost(H,x) — cost(C, x)| < 2e max ^ mr 

hf£H nif ^-^ 

hf(x) •s—^ 
= 2e max — > mr 

hf&H mr ^ ' 



\cost{H,x) — cost(C, x) 



f{x)-f'{x) + Af ^ 
ze max > m /■ 

A, 



< 2e max — - > m ^ 

feF ruf ^ ' 

6mf-cost(F,B) \ - ,^ 

< 2e max i — ^ ^ — - > ruf = 12ecost (F, B) 



□ 



In the above we use the fact that 1/(2;) — f'{x)\ < Aj. We conclude that, 

cost(F,x) - ^cost(F',x) + cost(C,x) -^A/ | < 12ecost(F, S). 

14 From B-Coresets to Metric B-Coresets 

We now turn to study algorithm B-CORESET when applied to functions F corresponding to a metric space. 
Namely, we show an improved analysis when F and the bi-criteria B correspond to points in a given metric 
space. We will use the analysis stated in this section in deriving improved results for specific clustering 
problems. 

In what follows, our set of data elements will correspond to points P in a metric space {A4 , dist). The set 
of functions corresponding to P may be referred to as F, G, H, or L depending on our specific application. 
The bi-criteria solution will also consist of points B in (7W,dist). Finally, we will denote certain subsets 
of points in (Al,dist) by S, and the corresponding functions they represent by S (as has been common 
throughout our presentation). 

Definition 14.1 (G(-)) Let P and B be two sets of points in a metric space (Al, dist), t>l and let e > 0. 

For p & P, let p' = proj(p, B), i.e., the closest point in B to p. For every p ^ P, define nip as in Line 1 of 
a call to Metric-B-Coreset(P, B, t, e). See Fig. 7. For every p e P, let gp : M M+ be defined as 
follows: 

_r^^i^ dist{p',x)<^^s^ 

I otherwise . 



9p( 
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Algorithm Metric-B-Coreset(P, B, t, e) 
1 for each p ^ P 



\P\dist{p,B) 
cost (P, 5) 



+ 1. 



2 Pick a non-uniform random sample 5 of t points from P, 

where for every q ^ S and p €^ P,we have q = p with probability mp/ J2z£P 

3 For p G P, let p' = proj(p, i?). 

4 for every p € 5 and set x of points, define 

r^£p^ dist(p',x)<^ii2M 

otherwise. 

5 for every p ^ P and a set x of points, define 

'O dist{p',x)<^^^^ 

1 otherwise. 

6 D ^ 5Uproj(P,5) 

7 return (D, 5, i«) 



w{p, x) 
set X of p 

x) 



Fig. 7: The algorithm Metric-B-Coreset. 



Notice the close resemblance between the definition ofw{p, x) in algorithm MetriC-B-CORESET and the 
definition of G. For every S Q P, we then define 

G{S) = G{S,B,e) = {gp\peS}. 

Theorem 14.2 Let (A^, dist) be a metric space, P,BC M, < e, 5 < 1/2, and t>l. Let {D, S, w) be 
the output of a call to the algorithm Metric-B-Coreset(P, B, e/2, t), with 

t>^(^dim(G(P),^)+log0, 

for a function space (G(P, e/2), X) = (G(P), X). Then, with probability at least 1 — 5, 



Vx € X{G{S,B,e/2)) 



cost(P, x) — w{p^ x) ■ dist(p, x) 



< ecost(P, B) + ecost(P, x) 



(70) 



Proof. Let G{S) = G{S,B,£/2). Let Y = X{G{S)). For evei-y p e P, let fp, : Y [0,oo) 
such that fp{x) = dist{p,x) fp{x) = dist(proj(p, i?), x), and nif = nip. Let F = {fp | p G P} and 
F' = {/^ I p G P}. Put X G y and p G P. If fp{x) > 2fp{B)/e, then, using the triangle inequality, 

2fp{B)<ef;{x)<efp{x) + efp{B). 

Hence, 2/p(P)(l - e/2) < efp{x), i.e, 2fp{B) < (1 + 2e)efp{x) < 2efp{x), so 

2/p(P) 



\fp{x)-f'p{x)\<fp{B)<efp{x). 



(71) 
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Let C be the output of a call to B-CORESET(i^y , F'^ , s, m, where Sf{x) = 2f{B)/e. Us- 

ing (71), applying CoroUaiy 13.3 with r = e/A yields 

Vx G y : |cost(F, x) - cost(C, x)| < ecost(F, x) + ecost(F, B). (72) 

Let T and G = {g \ p £ be the sets that are defined in Lines 2 and 5 , respectively, of the above 
call to B-Coreset; see Fig. 6. Note that it holds that G = G{P,B,e/2) by Definition 14.1. Therefore 
for G(P) = G{P,B,£/2) we have that dim (G, A") = dim(G(P), ;f ). In addition, it holds that t/^ (x) = 
w{p', x)dist(p', x), where w{p', x) is defined, in Line 5 of algorithm Metric-B-Coreset. Hence, 

cost(T, x) = w{p' , x)dist(p', x). (73) 

pGP 

Let 5={(7jp|p€5}.By the construction of S, we have that is a random sample of t i.i.d functions 
from G. By using a sufficiently large constant c in Theorem 7.3, with probability at least 1 — 5, S" is thus an 
e^/16-approximation of G^x{s) = 

We have 

cost (5, x) = ^ 9fp{x)- 

Also, for w{p, x) defined in algorithm Metric-B-Coreset, notice that our definitions imply that 

151 151 
9fpi^) = '^iP^ x)dist(p, x) = -— • w{p, x)dist(p, x) 

Here we use the fact that G is defined in algorithm B-CORESET to take mj copies of each gj. 

Suppose that S was used in Line 6 of the above call to B-CORESET. Using the last equation and (73) 
with the construction of G, yields 

IQl 

cost(C, x) = cost(T, x) + — -cost(S', x) = ^ w{p', x)dist(p', x) + ^ w{p, x)dist(p, x) 

' ' pGP pes 

= x)dist(p, x). 

peD 

We also have cost(F, x) = J2peP dist(p, x) = cost(P, x). By the last two equations and (72), we obtain 



Vx G y : 



cost(P, x) — w{p, x)dist(p, : 
peD 



< ecost(P, x) + ecost(P, B). 



□ 



Deflnition 14.3 (H(-)) Let P and B be two sets of points in a metric space dist), t > 1 and let e > 0. 

For p £ P, let p' = proj (p, B), i.e., the closest point in B to p. For every p € P, define nip as in Line 1 of 
a call to Metric-B-Coreset(P, B, t, e). See Fig. 7. For every p £ P, let hp : M ^ be defined as 
follows: 

hp{x) = V^ dist(y,x)<^ 
otherwise. 

For every S Q P, we then define 

H(5) = H(5, B, e) = {hp\peS}. 
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Lemma 14.4 Let t > 1, and {D, S, w) be the output of a call to Metric-B-Coreset(P, B, t, e), where 
P,B,e were defined in Theorem 14.2 and 

t>^ (^dim(H(P),A')+log0 . 
for a function space (H(P, B,e/2), X) = (H(P), X). Then, with probability at least 1 — 5, 
Vx e X{ll{S,B,e)) : 

cost(proj(P, B), x) + w{p, x)disi{p, x) — w{p, x)dist(proj(p, B), x) — w{p, x) ■ dist(p, x) 
< ecost{P,B) 

Proof. Let H(5) = H(5, B, e). Let Y = A^(H(5)), x€Y, and let M = {peP \ dist(p', x) < dist(p, B) /e}. 
Hence, 



w{p, x) ■ dist(p, x) = 'w{p, x) ■ dist(p, x) + cost(proj(P \ M, B), x) 

peD pes 

= x) • dist(p, x) + cost(proj(P, S), x) — cost(proj(M, i?), 



(74) 



pG5 



Therefore, 



cost(proj(P, B), x) + wip, x)dist(p, x) — w{p, x)dist{p', x) — w{p, x) ■ dist(p, 
p£S pes peD 

x) • dist(p', x) — cost(proj(M, B), a 

pes 

We now bound the right hand side of (75). 

For every p € P, let fp : X{il{P)) [0, oo) such that 



(75) 



fpix) 



dist(y,x) dist(y,x) < '^"^^(P'-^) 
otherwise, 



Sf = f,mf = rup and F = {fp \ p & P}- Let C be the output of a call to B-CORESET(F|y , 0, s,m,e^/6). 
Applying Theorem 13.1 with yields 

f fx) 

Vx G y :|cost(F,x) - cost(C,x)| < (eVs) max ^ m/ 

""^ (76) 
<(eV3)-^^^^-3|P|<.cost(P,5). 

Here, we use the fact that YlfeF "^f — ^l^l- Let G = { (^j^ | p € P} be the set that is defined in Line 5 
of the above call to B-CORESET; see Fig. 6. Note that for every p G 5 we have 

f/p(^) = = ■ ^^P^ ^) • dist(p', x). (77) 
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We thus have G = H(P), so dim(G,;f) = dim(H(P), ;f). Let S = {^/J p G 5} = H(5). By the 
construction of S, we have that 5" is a random sample of t i.i.d functions from G. By Theorem 7.3, with 
probability at least 1 — 5 we have that S is an e^/6-approximation of G\x{S) — G\y- Assume that this event 
indeed occurs. 
We have 

Jj^cost(S',x) = !S X] 3f{x). 



9fes 



Suppose that S was used in Line 6 of the above call to B-CORESET. By the construction of C, we have 
cost(C, x) = (jGj/|5'j)cost(5', x) (here, notice that the set T defined in Line 2 of B-CORESET is empty). 
Combining the last two equations with (77) yields 

\G\ 

cost(C, x) = —— ■ cost(5, x) = w{p, x)fp{x) = w{p^ 3;)dist(p', x). 
Using this with (76), we obtain 



cost(proj(M, B),x) — w{p, x)dist(p', a 
pes 

Plugging the last equation in (75) then proves the lemma. 



\cost{F,x) — cost(C, x)| < ecost{P, B). 



14.1 Generalizations for squared distances and distances to the power of 2; > 1 

Lemma 14.5 Let h and h' be two functions from a set X to [0, 00). Let z > 1, x £ X, f{x) 
and f'{x) = [h'{x)Y. Let i? C X, < e < 1, and suppose that 



Then 



fix) > 



\h{x)-h'{x)\ < h{B). 
{18zrf{B) _ 



□ 

{hix)r, 

(78) 



\fix)-f'{x)\<ef{x). 



Proof. It suffices to prove that for e < 1/ {18z), we have 

fiB) ^ 



fix) > 



\f{x)-f'{x)\<l8ezf{x). 



(79) 



Let a > 6 > 0. We have 



1=2 
z-l 



Z Z — 1 2—1 Z 

i=l i=l 1=2 

2 — 1 2 — 1 2 — 1 

a" 

i=l i=l i=l i=l 

2 — 1 2: — 1 2 — 1 2 — 1 

a 

1=1 i=l i=l i=l 

2-1 

^2— i— 1 



=1 i=l 1=1 1=1 

z—1 z—1 z—1 

: Y aV'-^ - b Y aV-'-^ + ^Y a"~"~^b' - b ^ a'-'-^' 



(80) 



a 



b) Y {a'b^-'-^ + a^-'~^b') 



i=l 



< (a - b){z - l)(2a^~') < 2z ■ a""-' ■ {a - b). 
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By substituting a = max {h{x), h'{x)} and b = min {h{x), h'{x)} in (80), we obtain 

- f'ix)\ =a' -b' <2z- a'~^ ■ \h{x) - h' {x)\ (81) 

Assume that /'(x) > By taking the zih root, we get h'{x) > h{B)/e. That is, h{B) < eh'{x). 

Using this with (78) yields 

h{B) < eh'ix) < e ■ {h{x) + h{B)) = eh{x) + eh{B), 

i.e, 

h{B) < < (1 + 2e)eh{x) < 2eh{x). 

Using (78) again, we thus have 

\h{x)-h'{x)\ < h{B) < 2eh{x). 

Hence, 

a = max{/i(x),/i'(x)} < h{x) + 2eh{x) = (1 + 2e)h{x). 
Combining the last two inequalities in (81) yields (79), as 
- f'{x)\ < 2z • (1 + 2ey~^h{xy~^2eh{x) 

= Aze{l + 2eY~^f{x) < Aze{l + {2/z)y-^f{x) < 2e^z£f{x) < 18zef{x), 
where in the last two deviations we used the assumption e < 1/(182;). □ 

14.2 Smaller Coresets 

Definition 14.6 (L(-)) Let P and B be two set of points in a metric space dist), and let t > \, and 

e > 0. For every p & P, define rup as in Line 1 of a call to Metric-B-Coreset(P, B, t, e). See Fig. 7. 
For every p P, let ip : Ai ^ M"*" be defined as follows: 

£ ^ ^ dist(p, x) — dist(proj(p, B),x) ^ 3 • cost(P, B) 
For every S C P, we then define 

L{S)=L{S,B,e) = {ep\peS}. 

Theorem 14.7 Let (Al, dist) be a metric space, P, B C M, < e,6 < 1/2, and t > 1. Let [D, S, w) be 
the output of a call to the algorithm Metric-B-Coreset(P, B, e/c, t), with 



t> ^ (^dim(L(P),^)+log0 , 



for a function space (L(P, B, e/c), X) = (L(P), X) where c is a sufficiently large constant. For every 
p S, let 

w{p) 

Then, with probability at least 1 — 5, 
Vx G X{l.{S,B,e/c)) : 



rUp • \S\ 



cost(P, x) — cost(proj(P, i?), x) + w(p)dist(p, x) — i(j(p)dist(proj(p, i?), : 
\ pes pes 

< ecost{P,B). 



(82) 
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Proof. Let L(5) = L(5,S,e/c). Let X = X(L{S)). For eveiy p G P, let p' = proj(p,B), and 
hp : X ^ [0, oo) be defined as 

, , , , , T / / N 3m„ • cost(P, S) 

np[x) = dist(p, X) — dist(p , x) + ■ 



Let Shp : X — )• [0, oo) be defined as Shp{x) = h{x), and H = {hp \ p € P}. Let C be the output of a call 
to B-C0RESET(i7, 0, s, m, e); see Fig. 6. 

Let G = {ghp \ p & P} be the set that is defined in Line 5 of the above call to B-CORESET. Note that 
for every p € 5 we have 

g,^(x) = (83) 
nip 

We thus have G = L{P), so dim(G, -^) = dim(L(P), Af). Let 5 = {gh^ \ p e S} = L(5). By the 
construction of S, we have that 5 is a random sample of t i.i.d functions from G. By Theorem 7.3, with 
probability at least 1 — we have that S is an e-approximation of G\x{s) = G\x- Assume that this event 
indeed occurs, and suppose that S was used in Line 6 of the above call to B-CORESET. 

Put X & X. We start by proving that the functions hp are positive. Namely, for Ap = 3"^p^ost(P,B) 

show that |dist(p, x) — dist(p', x)| < Ap. By the triangle inequality, for p € P 

|dist(p, x) — dist(y,x)| < dist(p, i?). 

Thus it suffices to prove that 

^^ 3mr, • costfP, B) 
dist{p,B) < V 

Now, 

3mp-cost{P,B) ^ g l^g^ff ■cost{P,B) 
^^^^ " + 

= ^ > dist(p, B) 

For every p e P, let fp, fp : X [0, oo) such that fp{x) = dist(p, x) and fp{x) = dist(p',x). Let 
F = {fp\peP} and F' = {f^\pe P}. By Corollary 13.4, 



cost 



{F, x) - ^cost(F', x) + cost(C, x) - ^ Ap^ 



< 12ecost(F,P). 



It also holds that Ap = 3cost(F, B). Thus, 

|cost(F,x) - (cost(F',x) - 3cost(F,B) +cost(C,x)) | < 12ecost(F, P). 

We also have 

E, , m„cost(P, i?) „x „^ 

w{p) ■ — )-^—L = cost{P,B) = cost{F,B). 

pes EqeP 
Using the last two inequalities. 
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cost(P, x) — cost(proj(P, -B), x) + (dist(ff , x) — dist(proj(p, S), x)) 

V pes 

( , ^ 3mc< 

3st(F, x) — cost(F',x) + cost(C, x) — y^w{p) ■ — ^ 

\ pes ^1 



= |cost(F,x) - (cost(F',x) +cost(C,x) - 3cost(F,S)) [ < 12ecost(F,5) = 6ecost(P,S). 

□ 

15 /c-Median in a Metric Space 

We now present the results obtained by applying our framework on the A;-median problem in metric spaces. 
We start by presenting a constant factor approximation. We assume that the time to compute the distance 
between two points in the metric space is 0(d). 

15.1 Constant Factor Approximation 

Theorem 15.1 Let (P, dist) be a metric space ofn points. Let < 5 < 1/2. A set x E can be computed 
in 0{ndk + k"^ + log^(l/(5) log^ n) time, such that, with probability at least 1 — 5, 



dist(p, x) < 0(1) • mill dist(p, x*). 



Proof. Let (3 = A; + log(2/5). Let x* denote the /c-tuple that minimizes ^pgp dist(p, x) over every x € P^. 
By Theorem 12. 17, a set i? C P of 0(/3 log n) points can be computed in 0(1) • {ndk + log^ time such 
that, with probability at least 1 — 5, 

cost(P,P) < 0(1) • cost(P,x*). (84) 

Let X G S'^ be a set such that 

cost(proj(P,B),x) < 0(1) min cost(proj(P,S),7/*). (85) 

Since proj(P, B) contains at most \B\ distinct weighted points, such a set x can be computed in time; 
see sui^vey in [MP04]. 

Fix p £ P. Using the triangle inequality, 

dist(p, x) < dist(p, proj(p, B)) + dist(proj(p, B),x) = dist(p, B) + dist(proj(p, B),x). 

Summing this over every p G P yields cost(P, x) < cost(P, B) + cost(proj(P, B),x). By this and (84), 
we obtain 

cost(P,x) < cost(P,P) + 0(1) • cost(proj(P,B),x*). (86) 
Fix p £ P. Using the triangle inequality, 

dist(proj(p, B),x*) < dist(proj(p, B),p) + dist(p, x*) = dist(p, B) + dist(p, x*). 
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Algorithm A;-Median-Coreset(P, B, t, e) 

1 for each b £ B 

2 Ph the set of points in P whose closest point in B is b. Ties are broken arbitrarily. 

3 for each b £ B and p G Pb 

\P\dist{p,B) 



rrir, 



+ 1. 



cost (P, 5) 

4 Pick a non-uniform random sample 5 of t points from P, where the probability 
that a point in S equals p G P, is nip/ Z^^gp mq. 

5 for each p £ S 



w{p) 



\S\ ■ irir. 



6 for each b e B 

7 w{b) ^ {1 + 10e)\Pb\ - w{p). 



pesnp^ 



8 D ^SUB 

9 return {D, S, w) 



Fig. 8: The algorithm fc-MEDlAN-CORESET. 



Summing this over every p £ P yields cost(proj(P, B),x*) < cost(P, B) + cost(P, x*). Using (86) with 
the last inequality yields 

cost(P,x) < cost(P,P) + 0(1) •cost(proj(P,P),x*) < 0(1) • cost(P, P) + 0(1) •cost(P,x*). 
By this and (84), we obtain cost(P, x) < 0(l)cost(P, x*), which proves this theorem. □ 



15.2 Strong Coresets for Metric A;-Median 

The following is a generalization of Theorem (6.3), as appeared in [LLSOO]. Although the original claim 
uses another definition of dimensionality (analogous to the VC -dimension), it can be easily verified that it 
also holds for our weaker definition of dimensionality. 

Theorem 15.2 ([LLSOO]) Let F be a set of functions from X to [0, 1], and let u,v,5 > 0. Let pr : F ^ 

[0, 1] be a distribution on F. Let cbe a sufficiently large constant. Let S be a non-uniform random sample 
of 

\S\ = ^( dim(F) • log(l/f ) + log(l/5)) (87) 

functions from F, where for every s E S and f F, we have Pr(s = /) = pr(/). Then, with probability 
at least 1 — 5, _ 

f{x) + s{x) + V 

where J{x) = f^F vAf) ' fi^)' and s{x) = Yl f^s fi^)/\S\- 

We start by proving a technical lemma regarding the weights defined in algorithm A:-Median-Coreset(P, B, t, e), 
see Fig. 8. 
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Corollary 15.3 Let P, B be two finite sets of points in a metric space, and < 5,e < 1/2. Let c be the 
constant from Theorem 15.2, and 

i > ^(31og|B|+log(l/5)). 

Let {D, w) be the pair that is returned from a call to the algorithm fc-MEDlAN-CORESET(P, i?, t, e), see 
Fig. 8. Then, with probability at least 1 — 6, we have 

ypeD: w{p) > 0. 

Proof. Let u = e and v = 1/(2"/^ Let 5 be the sample that is constructed during the execution of 
Line 4 of the algorithm; see Fig. 8. Hence, 



= t>^(21og(|i3|) + log(|i?|/<5)) 
>^flog(2^/2|5|) + log(|5|/(5)) 



(88) 



> 



c 



;iog(l/7;))+log(|i?|/J)). 



For every p ^ P, define fp-.B^ [0, 1] as 



fpib) 



\B\ ■ \Ph\mr 







P^Pb 



Let F = {fp \ p £ P} and S = {fp G F | p G S}. By the construction of <S, for every f £ F and s G 5, 
we have s = fp with probability pr(p) = mp/ J2qeP "^g- ^PPly Theorem 15.2 with S/\B\, d = 1 and 
X = {b} for some fixed b G B, and infer that, with probability at least 1 — 5/\B\, 



_\fib)-sib)\ 
f{b)+s{b) + v 



< n, 



(89) 



where f{b) = YlfpeF Pr(p)/p(^) ^(6) = YlfeS fpib)/\S\. Assume that (89) holds for every for every 
b £ B, which happens with probability at least 1 — 5. 
By (89), 

sib)<7ib) + ui7{b)+s{b)+v) 
= 7(6) (1 + u)+ us{b) + uv. 

That is, s{b) < {uv + 7(6)(1 + u)) /(I - u). Since n < e < 1/2, we obtain 

s{b) < {uv + 7(6)(1 + n)) (1 + 2n) 

< 2uv + 7(6)(1 + 4u)<^+ J{b) (1 + 4e) . 

\B\ 



We have 



fib) = Mp)fp{b) = E • = y ' ^ ■ 
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By the last two inequalities, 



1^1 I^IEgeP"^-?' 
Since 1 < 3|P|/ J2qeP "^g (by the definition of nip), we obtain 

-/M ^ 2£-3|P| (1 + 4£)|P| (1 + 10£)|P| 

sW<^^^^ — + — — <^^^^ — . (90) 

\^\l^qeP^<l \^\l^qeP^Q \^\l^qeP^<l 

For every p € P;,, we have w{p) = J2qeP "^g/d*^! ' "^p)' 

Wr,^ - 1 _ l^l-l^flEqeP^i ^fpib) _ l^|-|n|E,gP^g 

"^W- ^ - ipi |5| - ipi -^W- 

By this and (90), 

w{p) < {1 + We)\Pb\ 

peSnPt 

Hence, 

w{b) = {I + 10e)\Pb\ - wiP)>0- 

peSnPb 

Together with the fact that w{p) > for every p S 5, we conclude that w{p) > for every p £ S U B = D. 

□ 

We are now ready to address strong coresets for metric fc-median. 
Theorem 15.4 Let {P, dist) be a metric space ofn points. Let 0<e,5<l/2, and 

t = 4 • (A;logn + log(l/(^)), 



where c is a sufficiently large constant. Then a set D CL P, \D\ = t, with a weight function w : D ^ [0, oo) 
can be computed such that, with probability at least 1 — 6, 



Vx e P^ 



dist(p, x) — ^Y 't^(p)dist(p, x) 
peP p€D 



< e Y^ dist(p, x) 

P&P 



The running time is 0{nk + log^(l /6) log^ n + k"^ + t log n). 



Proof. By Theorem 15.1, a set i? C P of points can be computed in 0{nk) + (A; + log(2/5) log n)^ time 
such that, with probability at least 1 — 5, 

cost (P, 5) < 0(1) min cost(P,3;). (91) 

xGP'-' 

Consider the set of functions L(P); see Definition 14.6. Since |P| = n, we have dim(L(P)) = O(logn) 
for the case k = 1. Using Lemma 6.5, dim(L(P)) = 0{k log n) for any k >1. 

Let {D,S,w) be the output of a call to the algorithm /c-Median-Coreset(P, P, t, e). By Corol- 
lary 15.3, with probability at least 1 — 6, the weight function w is non-negative. Assume that this event 
indeed occurs. Let {D', S', w') be the output of a call to the algorithm Metric-B-Coreset(P, B, t, e). 
Since S and <S' have the same distribution, we assume w.l.o.g. that S = S'. 
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By Theorem 14.7, with probability at least 1 — 6, 

cost(P, x) — cost(proj(P, B), x) + w(p)dist(p, x) — t(;(p)dist(proj(p, B),x) | (92) 
\ pes pes 

< 0{e)cost{P,B). 

Assume that (92) indeed holds. Since proj(p, B) = b for every p G P^, we have 

cost(proj(P, B),x) + 'w{p)dist{p, x) — t(;(p)dist(proj(p, B),x) 
pes pes 

= ^j|Pfe|- J2 w{p)\ ■ dist{b,x) + ^w{p)dist{p,x) (93) 
beB \ peSnPi, / pes 

= ^ w{p)dist{p,x) - ^ 10e|Pb|dist(6,x). 

peD beB 

For every p € Pb, we have dist(5, x) < dist(6,p) + dist(p, x) < dist(p, B) + dist(p, x). Summing over 
every p £ Pb and b £ B yields 

^ |Pf,|dist(6, x) < cost(P, B) + cost(P, x). 

beB 

Hence, 

^10e|Pb|dist(6,x) < 0(e)cost(P, P) + 0(e)cost(P, x). 

beB 

Combining the last inequality with (92) and (93) yields 



Vx € P^ 



cost(P, x) — w{p)dist{p, x] 

peD 



< 0(e)cost(P, B) + 0(e)cost(P, x) 



Given B, the set D can be constructed in 0{nk + tlogn) (for example, using a binary search tree). By 
using (91) and a sufficiently large constant c, this proves the theorem. □ 

16 A;-Median in 

In the upcoming section we address the special case of A;-median in W^. 
16.1 Strong Coresets 

We start by stating a technical lemma addressing arrangements of balls in M''. 

Lemma 16.1 Let P be a set of points in M'', and let c G M. For every p £ P, let h : P ^ be a 
mapping from every p £ P to a point p' = h{p). For every p £ P, let fp : X{j, 1) [0, oo) be defined as 
fp{x) = dist(p, x) — dist(p', x) + c. Then the dimension of F = {fp \ p e P} is 0{d{j + 1)). 
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Proof. Put S Q P. For eveiy x e X{j,l) and r € M, let 

range(x, r) = |p € 5 | dist(p, x) — dist(p', x) < r — c} . 

Let i?+ = {range(x,r) | x G X{j, l),r — c > 0}, and R~ = {range(x,r) | x G X{j, l),r — c < 0}. 
We have 



I {range(x, r) \ x e X{j, 1), r > 0} | < | {range(x, r) | x G X{j, 1), r G , 

<\R+\ + \R~\. ^ ^ 

We now bound and then 

Let r G M, p G S" and x G X{j, 1). We define dist^(p, x) := (dist(p, x))^. Since x is a j-flat, there is a 

tuple of j + 1 vectors ho, - ■ ■ , hj G M'' such that a; = |/io + X]i=i | ai, • • • , G m|, and 

j 

dist^(p,x) = \\p-ho\\l - J2 {iP-^of^i) 

i=l 
3 

= \\P - hoWl - ^{P^hi - hlhif. 
1=1 

For two vectors (mi, • • • , m^) G and (^i, • • • ,yt) G M*, we denote by my the tuple mi, • • • , m^, yi, - ■ ■ , yt- 
Let h= {l,r - c,ho - ■ ■ hj) G and g = {l,pp') G M^'^+^ where p' = h{p). Hence, we have 

dist2(p,x) - dist^(/,x) - (r - c)^ = ^ Qo,n,i2,i3 9i(Wn^i2^j3' (95) 

i(),nG[2c(+i],i2,j3e[rf{i+i)+2] 

where Ci^^i^^i^^i^ is a constant that depends only on io, . . . , is, and equals to zero for all except di = 0{d{j + 
1)) terms of the summation. Equation (107) implies that there are two di -dimensional vectors ui = ui{p) 
and vi = vi{x, r, c), such that 

ujvi > <^ dist^(p, x) - dist^(p', x) - (r - cf > 0. (96) 

Similarly, 

(dist^(p, x) — dist^(p',x) — (r — c)^)^ — (2(r — c)dist(p', x))^ 

~ '^io,.. .,17^*0 ■ ■ ■ fts^M ' ' ' hij, 

io,-,*36[2rf+i],j4,---,«7e[rf(i+i)+2] 

where is a constant that depends only on io, ^7 and equals to zero for all except (i2 = 0((i(j + l)) 

terms. Hence, there are two (i2 -dimensional vectors, U2 = U2{p) and V2 = V2{x, r), such that 

u^V2 > {disi?{p,x) - dist2(y,x) - (r - cf)"^ - (2(r - c)dist(/, x))^ > 0. (97) 

Letoi = (0, ••• ,0) G R'^\ 02 = (0, • • • ,0) G M'^^ u = u{p) = {uiU2),v = v{x,r) = {V1O2), and 
z = z{x, r) = {01V2) G M'^i+''2_ By (95) have 

u^v>Q^ dist2(p, x) - dist2(p', x) - (r - c)^ > 0, (98) 

and by (97) 

n'^z > 44> (dist^(p, x) - dis\^{p', x) - (r - cff - (2(r - c)dist(p', x))^ > 0. (99) 
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(101) 



Suppose that range(x, r) G R~^. We now prove that 

p G range(j;, r) <^ [vFv < or z < O) . (100) 

Indeed, since r — c > 0, 

p € range(x, r) dist(p, x) — dist(p', x) < r — c 
<;4> dist(p, x) < r — c + dist(p', x) 

dist^(p, x) < (r - cf + dist^(p', x) + 2(r - c)dist(p', x) 
4^ dist^(p, x) — dist^(p',x) — (r — c)^ < 2(r — c)dist(p', x). 

By (98), 

[u^v > and dist?(p, x) — dist^(p', x) — (r — c)^ < 2(r — c)dist(p', x)) 

^ [u^v > and (dist^(p, x) - dist^(p', x) - (r - cf f < (2(r - c)dist(p', x))^) 

4^ {u^v > and u^z < O) , 

where the last deviation is by (99). By the last equation and (101), 

{^u^v > andp G range(x,r)) <^ [u^v > and u^z < O). (102) 

We have by (98) 

u^v < ^ 

dist^(p, x) — dist^(p', x) — (r — c)^ < < 2(r — c)dist(p', x). 

Combining this with (101) yields u'^v < ^ p £ range(x,r). Using the last equation with (102) 
proves (100). 

Let U = {u{p) \ pe S} C M'^i+'^2_ For every v,z £ W^^-^'^\ let 

range'(u, z) = £ U \ vFv < or u'' z < O} . 
By (100), range(x, r) = range' (i)(x, r), z(x, r)). Hence, 

= I {range(x,r) | x G X{j,l),r-c> 0} | < | |range'(z;,z) | -^,2 G M'^i+'^^j | 
It is not hard to verify that 

I {range'(t;,z) \v,ze M'^i+'^^j | < p^O{d,+d2) ^ |^|0(d(i+i))^ 

Combining the last two equations yields 

\R+\ < ISp^'^^J'+i)). (103) 
We now bound | in a similar way. We have 



|i? j = [ {range(x, r) | x G X{j, 1), r — c < 0} j 

= I {{p G 5 j dist(p, x) — dist(p',x) < r — c} | x G X{j, l),r — c < O} | 
= \{{pe S \ dist(p',x) - dist(p,x) > \r - c\] | x G X{j, l),r - c < O} | 
= I {{p G 5 I dist(p',x) — dist(p, x) > r — c} | x G X{j, l),r — c > O} | 



(104) 



For every r G M and aset Q = {p £ S \ dist(p', x) — dist(p, x) > r — c} there is a corresponding distinct 
set S\Q = {p £ S \ dist(p', x) — dist(p, x) < r — c}. Hence, 

I {{p G S" I dist(p', x) — dist(p, x) > r — c} \ x £ X{j, 1), r — c > O} | 

< \{{p £ S \ dist(p',x) - dist(p,x) < r - c) j x € l),r - c> O} | (105) 

< I { {p S 5 I dist(p', x) — dist(p, x) < r — c} | x € -'i^O', 1), — c > O} j. 

By replacing p with p' and range(x, r) with {p £ S \ dist(p', x) — dist(p, x) < r — c} in the proof of (100), 
we can bound the last term of (105) by |S'p(''(j+i)). Together with (104), we obtain \R~ \ < \S\^'^'^^l 
Plugging the last equation and (103) in (94) yields 

I {range(x,r) | x G X, r > 0} | < + < \S\'^^'^^^+'^^\ 

Since the last inequality holds for any 5 C P, the dimension of {fp\p £ P} is 0{d{j + 1)). □ 

The following lemma follows from the fact that every cell in an arrangement of balls in R'' corresponds 
to a different intersection of at most 0{d) balls; see [SA95]. 

Lemma 16.2 Let A be the arrangement of a set ofn open ball in W^. There is asetVc M.'^, \V\ < rP'^'^X 
that intersects every vertex, edge, face and cell of A. 

Recall that X[j, k) was defined in Section 12.1 to be all the possible A;-tuples of j-flats in M.'^. 

Lemma 16.3 Let P be a set of points in M'^, and k,j > 1. For every p £ P, let Sp,Cp,Zp > and define 
9p ■ X{j,k) [0,oo) as 

{Cpdist(p, x) Zp < dist(p, x) < Sp 
otherwise, 

and let G = {gp \ p £ P}. Then dim(G) = 0{dj). 

Proof. We prove the lemma for the case k = 1. The case k > 1 then follows from Lemma 6.5. Put S C P. 

For every x £ X{j, 1) and r > 0, let 

range(x,r) = {p £ S \ gp{x) < r} 

= {p £ S \ disi?{p, x) — Sp > or disi?{p, x) — < or dist(p, x)^ — r^/c^ < O} . 

(106) 

Let r > 0, p £ S, and x £ X{j, 1). Since x is a j-flat, there is a tuple of j + 1 vectors Hq, ■ ■ ■ , hj £ Mf^ 
such that X = |/io + ^{=1 aihj \ ai, . . . ,aj £ m|, and 

j 

dist?(p,x) - r^/cp = Hp - /lolls " ^ {{p-hofhi) - r^/Cp 

i=l 

j 

= \\p - hoWl ~ '^iP^f^i - h^hif - r'^/cl. 
1=1 

For two vectors (mi, • • • , m^) £ W and (yi, • • • ,yt) £ M*, we denote by my the tuple mi, • • • , m^, ?/i, • • • ,yt. 
Let h' = (1, rVcp, ho ■ ■ ■ hj) £ M'^0+i)+2, and p' = (l,p) £ M'^+i. Hence, 

dist^(p, x) -rVcp = Cio,ii,i2,i3P'ioP'hK2K3^ (107) 

io,neM+i],«2,j3e[d(i+i)+2] 
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where Cig,ii,i2,i3 is a constant that depends only on zq, . . . , is, and equals to zero for all except di = 0{d{j + 
1)) terms of the summation. 

Equation (107) implies that there are two -dimensional vectors ui = ui{p) and vi = fi(x,r^/Cp), 
such that 

u[vi < <^ dist(p, xf - r'^/cl < 0. (108) 
Similarly, we can prove that there are two di -dimensional vectors U2 = U2{p) and V2 = V2{x, Zp), such that 

u^V2 < ^ dist(p, xf - < 0, (109) 
and that there are two di -dimensional vectors = u^{p) and = v-j{x, Sp). 

> ^ dist(p, xf - Sp > 0, (1 10) 

Let o = (0, ••• ,0) € M'^i. Let u = u{p) = {uiU2U-i),zi = zi(x,r) = {vioo), Z2 = Z2{x^r) = 
{0V20), z-i = Z2,{x,r) = {oovi) be vectors in By (106), (108), (109) and (110) 

p G range(a;, r) {u^zi < or u^Z2 < or u^z-^ > O) . (HI) 
Let U = {u{p) j p € S}. For every zi,Z2,Z3 G M^'^i let 

range' (zi, 2:2, -23) = {u & U \ u^zi < or u'^Z2 < or u^z^ > O} . 
It is not hai^d to verify that 

|{range'(zi,Z2,Z3) I zi, Z2, Z3 eR^'''} \ < |[7p(3'^i) = \S\OW-^^)) . (112) 
By (111), range(x, r) = range'(zi, Z2, 2:3). Hence, 

I {range(x,r) | x G X{j, l),r > 0} | < j |range(zi, 22, 23) | 21,22,2:3 e M^'^^j | 
Using the last equation with (1 12) yields 

I {range(x,r) | x G X, r > 0} | < \Sf^'^'^^+^^\ 
Since the last inequality holds for any C P, the dimension of {fp\p G P} is 0{d{j + 1)). □ 

Theorem 16.4 (strong coresets for /c-median in R'^) Let P be a set of n points in M.'^. Let k > 1 be an 
integer, Q < e,5 < 1/2, and 

t = £-.[dk + log{l/S)), 

where c is a sufficiently large constant. Then, a set D C P of size \D\ = t, with a weight function 
w : D ^ [0, 00), can be computed such that, with probability at least 1 — 5, 



Vx G {R'^f : 



dist(p, x) — w(p)dist(p, 



< e dist(j», x) 

pGP 



The running time is 0{ndk + log^ (1/ 5) log^ n + k"^ +t log n). 



Proof. The proof is the same as the proof of Theorem 15.4, except for the computation of dim(L(P)). In 
this case, we have dim(L(P)) = 0{kd) instead of dim(L(P)) = 0{k log n), as proved in Lemma 16.3. □ 



17 A;-Line Median 



Theorem 17.1 (Strong coreset for fc-lines in M'^) Let P C M.<^, k > 1, < e, 6 < 1/2, r = k + log (1/5) 

and 

t > -2 ( d/c + log - 

for a sufficiently large constant c. A set D of 0{t) + ((1/e) log n)*^'-'^-* points and a weight function w : 
D — > [— oo, oo) can be computed in 0{ndk) + 0{dt^) + t^'^^^ log^n time, such that, with probability at 
least 1 — 5, for every set xofk lines in W^, 



^ dist(p, x)-^ w{p)d\si{p, : 



< e ^ dist(p, 
peP 



X . 



(113) 



Proof. Let r = k + log |. By Theorem 12. 12, a set B of 0{k log n) lines that satisfies 

cost(P,B) <0(1) mill cost(P,x*) 

can be computed, with probability at least 1 — 5, in time 

Bicriteria = 0{ndk) + 0{dr'^) + r^^'^^ log^ n. 

Assume that this event indeed occurs. 

Let {D',S,w') be the output of a call to the algorithm Metric-B-Coreset(P, i?, t, e/c). For ev- 
ery S C L(P), let P(:{S) = X{2,k) denote all the possible lines in W'-. By Lemma 16.3, we have that 
dim(L(P), X) = 0{dk). By Theorem 14.7, with probability at least 1-5, 



Vx G X{2,k) : 

cost(P, x) — cost(proj(P, S), x) + w' {p)(l\st{p, x) — u;'(p)dist(proj(p, B)^ : 
\ pes pes 

< ecost{P,B). 



(114) 



Using the result from [FFS06], a set C, \C\ = \B\ ■ {{1/e) log n)'^('=), with a weight function u : C 
[0, oo) can be constructed in 0{ndk) time such that 



Vx S X{2,k) |cost(proj(P, i?), x) — ti(p)dist(p, x)| < ecost(proj(P, S) 

pec 



X). 



We have dist(proj(p, B),x) < dist(proj(p, B),p) + dist(p, x) = dist(p, B) + dist(p, x) for every p ^ P. 
Summing over every p £ P, yields cost(proj(P, B),x) < cost(P, B) + cost(P, x). Hence, 

Vx G X{2,k) |cost(proj(P,S),x) - ^ n(p)dist(p, x)| < 0{e)cost{P, B) + 0{e)cost{P,x). (115) 

peC 



Let D = C U 5 U proj(5, B) and 

w(j>) = < 



u{p) p G C 
w'{p) p & S 
^-w'{p) peproj(5,S) 
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Using the triangle inequality, 



Vx G X{2,k) : 



dist(p, x) — w(p)dist(p, 

pGP p€D 



< 



cost(P, x)— tt(p)dist(j), x) + w^(p)dist(j), x) — w;'(p)dist(p, : 

\pGC pe5 pgproj(5,B) 

cost(P, x) — cost(proj(P, B), x) + i(;'(p)dist(p, x) — ?i;'(p)dist(proj(p, B), ; 
\ pes pe5 



+ 



cost(proj(P, B),x) — ti(p)dist(p, x) 



Together with (1 13), (1 14) and (1 15) this proves the theorem as 

Vx G X{2, k) : dist(p, x) — w{p)dist{p, a 

< ecost(P, B) + 0(e)cost(P, + 0(e)cost(P, x) 

< 0(e)cost(P,x). 



□ 



18 5-Coresets for Projective Clustering 
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